机器学习学术速递[9.15]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计107篇

大模型相关(12篇)

【1】Inpainting-Guided Policy Optimization for Diffusion Large Language Models
标题：扩散大型语言模型的修补引导政策优化
链接：https://arxiv.org/abs/2509.10396

作者：o, Mengchen Liu, Jing Huang, Miao Liu, Chenyu Wang, Bo Liu, Yuandong Tian, Guan Pang, Sean Bell, Aditya Grover, Feiyu Chen
备注：preprint; 21 pages
摘要：掩蔽扩散大语言模型（dLLM）正在成为自回归LLM的有前途的替代品，提供有竞争力的性能，同时支持独特的生成功能，如修复。我们探讨如何修复可以通知RL算法设计dLLM。将LLM与强化学习相结合面临着一个探索挑战：当模型未能发现正确的解决方案时，稀疏的奖励信号和样本浪费。虽然这种低效率广泛影响了LLM，但dLLM提供了一个独特的机会-它们的修复能力可以指导探索。我们引入了IGPO（修复引导策略优化），这是一个RL框架，它在在线采样过程中战略性地插入部分地面实况推理轨迹。与提供完整的解决方案不同，修复将探索转向有希望的轨迹空间，同时保留自我生成的推理，桥接监督微调和强化学习。我们将IGPO应用于基于组的优化方法，如GRPO，其中勘探失败导致零优势和梯度。IGPO可恢复有意义的梯度，同时提高采样效率。我们还提出了监督微调合成重写简洁的痕迹，更好地配合DLLM生成模式。通过包括基于熵的过滤在内的其他技术，我们的训练方法在三个数学基准（GSM 8 K，Math 500和AMC）上产生了实质性的收益，为全注意力掩蔽的dLLM实现了新的最先进的结果。
摘要：Masked diffusion large language models (dLLMs) are emerging as promising alternatives to autoregressive LLMs, offering competitive performance while supporting unique generation capabilities such as inpainting. We explore how inpainting can inform RL algorithm design for dLLMs. Aligning LLMs with reinforcement learning faces an exploration challenge: sparse reward signals and sample waste when models fail to discover correct solutions. While this inefficiency affects LLMs broadly, dLLMs offer a distinctive opportunity--their inpainting ability can guide exploration. We introduce IGPO (Inpainting Guided Policy Optimization), an RL framework that strategically inserts partial ground-truth reasoning traces during online sampling. Unlike providing full solutions, inpainting steers exploration toward promising trajectory spaces while preserving self-generated reasoning, bridging supervised fine-tuning and reinforcement learning. We apply IGPO to group-based optimization methods such as GRPO, where exploration failures cause zero advantages and gradients. IGPO restores meaningful gradients while improving sample efficiency. We also propose supervised fine-tuning on synthetically rewritten concise traces that better align with dLLM generation patterns. With additional techniques including entropy-based filtering, our training recipe yields substantial gains across three mathematical benchmarks--GSM8K, Math500, and AMC--achieving new state-of-the-art results for full-attention masked dLLMs.

【2】Robot guide with multi-agent control and automatic scenario generation with LLM
标题：具有多智能体控制和LLM自动场景生成的机器人向导
链接：https://arxiv.org/abs/2509.10317

作者： D. Moskovskaya, Anton D. Moscowsky
备注：14 pages, 5 figures, 2 tables, 1 demo-video and repository link
摘要：这项工作描述了一个拟人化的导游机器人的混合控制体系结构的发展，结合多智能体资源管理系统与基于大型语言模型的自动行为场景生成。所提出的方法旨在克服传统系统的局限性，依赖于手动调整的行为场景。这些限制包括手动配置、低灵活性和机器人行为缺乏自然性。旅游场景的准备过程是通过两个阶段的生成来实现的：首先，创建一个程式化的叙事，然后将非语言动作标签整合到文本中。多智能体系统确保在执行并行动作期间的协调和冲突解决，以及在完成主要操作后保持默认行为，从而有助于更自然的机器人行为。从试验中获得的结果表明，所提出的方法自动化和扩展社会机器人控制系统的潜力。
摘要：The work describes the development of a hybrid control architecture for an anthropomorphic tour guide robot, combining a multi-agent resource management system with automatic behavior scenario generation based on large language models. The proposed approach aims to overcome the limitations of traditional systems, which rely on manual tuning of behavior scenarios. These limitations include manual configuration, low flexibility, and lack of naturalness in robot behavior. The process of preparing tour scenarios is implemented through a two-stage generation: first, a stylized narrative is created, then non-verbal action tags are integrated into the text. The multi-agent system ensures coordination and conflict resolution during the execution of parallel actions, as well as maintaining default behavior after the completion of main operations, contributing to more natural robot behavior. The results obtained from the trial demonstrate the potential of the proposed approach for automating and scaling social robot control systems.

【3】Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications
标题：对LLM生成的科学出版物评论的即时注入攻击
链接：https://arxiv.org/abs/2509.10248

作者：per
摘要：关于在科学同行评审过程中LLM使用率上升的持续激烈讨论最近被作者使用隐藏提示注入来操纵评审分数的报告所混合。由于这种“攻击”的存在-虽然被一些评论家视为“自卫”-将有很大的影响，进一步的辩论，本文探讨了所描述的操纵的实用性和技术上的成功。我们的系统评估使用了由各种LLM生成的2024篇ICLR论文的1 k篇评论，显示了两个不同的结果：I）非常简单的快速注射确实非常有效，达到100%的接受分数。LLM评论通常偏向于接受（在许多模型中>95%）。这两个结果对正在进行的关于LLM在同行评审中的使用的讨论产生了很大的影响。
摘要：The ongoing intense discussion on rising LLM usage in the scientific peer-review process has recently been mingled by reports of authors using hidden prompt injections to manipulate review scores. Since the existence of such "attacks" - although seen by some commentators as "self-defense" - would have a great impact on the further debate, this paper investigates the practicability and technical success of the described manipulations. Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs shows two distinct results: I) very simple prompt injections are indeed highly effective, reaching up to 100% acceptance scores. II) LLM reviews are generally biased toward acceptance (>95% in many models). Both results have great impact on the ongoing discussions on LLM usage in peer-review.

【4】Population-Aligned Persona Generation for LLM-based Social Simulation
标题：基于LLM的社交模拟的人口一致角色生成
链接：https://arxiv.org/abs/2509.10127

作者：u, Zheyuan Xiao, Max Xiong, Yuxuan Lei, Tianfu Wang, Jianxun Lian, Kaize Ding, Ziang Xiao, Nicholas Jing Yuan, Xing Xie
摘要：大型语言模型（LLM）的最新进展使人类社会模拟达到了前所未有的规模和保真度，为计算社会科学提供了新的机会。然而，一个关键的挑战是真实地代表现实世界人口的多样性和分布的人物角色集的构建。大多数现有的基于LLM的社会模拟研究主要集中在设计代理框架和模拟环境，往往忽略了人物角色生成的复杂性和潜在的偏见所引入的非代表性人物角色集。在本文中，我们提出了一个系统的框架，合成高质量，人口对齐的人物角色集LLM驱动的社会模拟。我们的方法首先利用LLM从长期社交媒体数据中生成叙事人物角色，然后进行严格的质量评估以过滤掉低保真度的个人资料。然后，我们应用重要性抽样来实现与参考心理测量分布（如大五人格特质）的全球一致性。为了满足特定模拟环境的需求，我们进一步引入了一个特定于任务的模块，该模块将全局对齐的角色集调整为目标子群体。大量的实验表明，我们的方法显着降低了人口水平的偏见，并实现了广泛的研究和政策应用准确，灵活的社会模拟。
摘要：Recent advances in large language models (LLMs) have enabled human-like social simulations at unprecedented scale and fidelity, offering new opportunities for computational social science. A key challenge, however, is the construction of persona sets that authentically represent the diversity and distribution of real-world populations. Most existing LLM-based social simulation studies focus primarily on designing agentic frameworks and simulation environments, often overlooking the complexities of persona generation and the potential biases introduced by unrepresentative persona sets. In this paper, we propose a systematic framework for synthesizing high-quality, population-aligned persona sets for LLM-driven social simulation. Our approach begins by leveraging LLMs to generate narrative personas from long-term social media data, followed by rigorous quality assessment to filter out low-fidelity profiles. We then apply importance sampling to achieve global alignment with reference psychometric distributions, such as the Big Five personality traits. To address the needs of specific simulation contexts, we further introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations. Extensive experiments demonstrate that our method significantly reduces population-level bias and enables accurate, flexible social simulation for a wide range of research and policy applications.

【5】HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning
标题：HEFT：从粗到细的层次结构，用于提高语言模型推理的效率和准确性
链接：https://arxiv.org/abs/2509.09801

作者：ill
摘要：大型语言模型（LLM）对专业推理任务的适应从根本上受到计算资源的限制。参数有效的微调（PEFT）方法已经成为一个强大的解决方案，但这些技术的景观是多种多样的，不同的方法在模型的权重空间或其表示空间。本文研究的假设，这些范例的协同组合可以解锁卓越的性能和效率。我们介绍了HEFT（分层高效微调），一种新的分层自适应策略，以粗到细的方式组成两种不同的PEFT方法：首先，使用低秩自适应（LoRA）在权重空间中进行广泛的基础自适应，然后使用表示微调（ReFT）对内部激活进行精确的手术细化。我们通过在BoolQ基准上微调Llama-2- 7 B模型来评估这种方法，BoolQ基准是一个具有挑战性的推理数据集。我们的研究结果揭示了一个深刻的协同效应。使用我们的HEFT策略仅对三个时期进行微调的模型实现了85.17%的准确率，超过了使用仅LoRA（85.05%）或仅ReFT（83.36%）方法训练20个时期的模型的性能。这项工作表明，PEFT方法的深思熟虑的组合是一种强有力的算法创新，为提高语言模型的推理能力提供了一条更有效的途径。通过用一小部分计算预算实现优异的结果，我们的研究结果提出了一种原则性的方法来克服复杂认知任务适应大规模模型所固有的障碍。
摘要：The adaptation of large language models (LLMs) to specialized reasoning tasks is fundamentally constrained by computational resources. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a powerful solution, yet the landscape of these techniques is diverse, with distinct methods operating in either the model's weight space or its representation space. This paper investigates the hypothesis that a synergistic combination of these paradigms can unlock superior performance and efficiency. We introduce HEFT (Hierarchical Efficient Fine-Tuning), a novel hierarchical adaptation strategy that composes two distinct PEFT methods in a coarse-to-fine manner: first, a broad, foundational adaptation in the weight space using Low-Rank Adaptation (LoRA), followed by a precise, surgical refinement of internal activations using Representation Fine-Tuning (ReFT). We evaluate this approach by fine-tuning a Llama-2-7B model on the BoolQ benchmark, a challenging dataset for inferential reasoning. Our results reveal a profound synergistic effect. A model fine-tuned for only three epochs with our HEFT strategy achieves an accuracy of 85.17\%, exceeding the performance of models trained for 20 epochs with either LoRA-only (85.05\%) or ReFT-only (83.36\%) methodologies. This work demonstrates that the thoughtful composition of PEFT methods is a potent algorithmic innovation, offering a more efficient and effective path toward advancing the reasoning capabilities of language models. By achieving superior results with a fraction of the computational budget, our findings present a principled approach to overcoming the obstacles inherent in adapting large-scale models for complex cognitive tasks.

【6】One Head, Many Models: Cross-Attention Routing for Cost-Aware LLM Selection
标题：一头多模型：成本意识LLM选择的交叉注意路由
链接：https://arxiv.org/abs/2509.09782

作者：ulishetty, Mani Kishan Ghantasala, Keerthy Kaushik Dasoju, Niti Mangwani, Vishal Garimella, Aditya Mate, Somya Chatterjee, Yue Kang, Ehi Nosakhare, Sadid Hasan, Soundar Srinivasan
摘要：具有不同计算成本和性能配置文件的大型语言模型（LLM）的激增对现实世界应用程序中的可扩展，经济高效的部署提出了关键挑战。我们引入了一个统一的路由框架，该框架利用单头交叉注意机制来联合建模查询和模型嵌入，从而为每个输入查询动态选择最佳LLM。我们的方法是在RouterBench上进行评估的，RouterBench是一个大规模的公开基准测试，包括各种LLM池和域。通过显式捕获细粒度的查询模型交互，我们的路由器预测响应质量和生成成本，实现了6.6%的平均质量改进（AIQ）和2.9%的最大性能比现有路由器提高。为了稳健地平衡性能和成本，我们提出了一个指数奖励函数，以增强用户偏好的稳定性。由此产生的架构是轻量级的，跨域有效地推广，并证明了与现有方法相比提高了效率，建立了成本感知LLM路由的新标准。
摘要：The proliferation of large language models (LLMs) with varying computational costs and performance profiles presents a critical challenge for scalable, cost-effective deployment in real-world applications. We introduce a unified routing framework that leverages a single-head cross-attention mechanism to jointly model query and model embeddings, enabling dynamic selection of the optimal LLM for each input query. Our approach is evaluated on RouterBench, a large-scale, publicly available benchmark encompassing diverse LLM pools and domains. By explicitly capturing fine-grained query-model interactions, our router predicts both response quality and generation cost, achieving up to 6.6% improvement in Average Improvement in Quality (AIQ) and 2.9% in maximum performance over existing routers. To robustly balance performance and cost, we propose an exponential reward function that enhances stability across user preferences. The resulting architecture is lightweight, generalizes effectively across domains, and demonstrates improved efficiency compared to prior methods, establishing a new standard for cost-aware LLM routing.

【7】A meta-analysis on the performance of machine-learning based language models for sentiment analysis
标题：基于机器学习的情感分析语言模型性能的元分析
链接：https://arxiv.org/abs/2509.09728

作者：de, Jonas Klingwort, Christian Borgs
摘要：本文提出了一种元分析方法，用于评估ML在Twitter数据情感分析中的性能。该研究旨在估计平均性能，评估研究之间和研究内的异质性，并分析研究特征如何影响模型性能。使用PRISMA指南，我们检索了学术数据库，并从20项研究中选择了195项试验，其中包括12项研究特征。使用双反正弦变换和三水平随机效应模型分析了报告最多的性能指标总体准确性。AIC优化模型的平均总体准确度为0.80 [0.76，0.84]。本文提供了两个关键的见解：1）总体准确性被广泛使用，但由于其对类别不平衡和情感类别数量的敏感性，通常会产生误导，突出了规范化的必要性。2)模型性能的标准化报告，包括报告独立测试集的混淆矩阵，对于跨研究的ML分类器的可靠比较至关重要，这似乎远非常见的做法。
摘要：This paper presents a meta-analysis evaluating ML performance in sentiment analysis for Twitter data. The study aims to estimate the average performance, assess heterogeneity between and within studies, and analyze how study characteristics influence model performance. Using PRISMA guidelines, we searched academic databases and selected 195 trials from 20 studies with 12 study features. Overall accuracy, the most reported performance metric, was analyzed using double arcsine transformation and a three-level random effects model. The average overall accuracy of the AIC-optimized model was 0.80 [0.76, 0.84]. This paper provides two key insights: 1) Overall accuracy is widely used but often misleading due to its sensitivity to class imbalance and the number of sentiment classes, highlighting the need for normalization. 2) Standardized reporting of model performance, including reporting confusion matrices for independent test sets, is essential for reliable comparisons of ML classifiers across studies, which seems far from common practice.

【8】DiTTO-LLM: Framework for Discovering Topic-based Technology Opportunities via Large Language Model
标题：DiTTO-LLM：通过大型语言模型发现基于主题的技术机会的框架
链接：https://arxiv.org/abs/2509.09724

作者：Kim, Sujeong Seo, Juhyun Lee
备注：5 figures
摘要：技术机会是关键信息，是技术、产业和创新进步的基础。本文提出了一个框架的基础上的技术之间的时间关系，以确定新兴技术的机会。该框架首先从专利数据集中提取文本，然后映射基于文本的主题以发现技术间的关系。然后通过跟踪这些主题随时间的变化来确定技术机会。为了提高效率，该框架利用大型语言模型来提取主题，并采用基于聊天的语言模型的提示来支持技术机会的发现。该框架使用美国专利和商标局提供的人工智能专利数据集进行评估。实验结果表明，人工智能技术正在演变为促进日常可访问性的形式。这种方法表明了拟议框架在确定未来技术机会方面的潜力。
摘要：Technology opportunities are critical information that serve as a foundation for advancements in technology, industry, and innovation. This paper proposes a framework based on the temporal relationships between technologies to identify emerging technology opportunities. The proposed framework begins by extracting text from a patent dataset, followed by mapping text-based topics to discover inter-technology relationships. Technology opportunities are then identified by tracking changes in these topics over time. To enhance efficiency, the framework leverages a large language model to extract topics and employs a prompt for a chat-based language model to support the discovery of technology opportunities. The framework was evaluated using an artificial intelligence patent dataset provided by the United States Patent and Trademark Office. The experimental results suggest that artificial intelligence technology is evolving into forms that facilitate everyday accessibility. This approach demonstrates the potential of the proposed framework to identify future technology opportunities.

【9】ALIGNS: Unlocking nomological networks in psychological measurement through a large language model
标题：对齐：通过大型语言模型解锁心理测量中的法则网络
链接：https://arxiv.org/abs/2509.09723

作者：rsen, Sen Yan, Roland Müller, Lan Sang, Mikko Rönkkö, Ravi Starzl, Donald Edmondson
摘要：心理测量对许多学科都至关重要。尽管在测量方面取得了进展，但在Cronbach和Meehl将其作为验证的基础提出70年后，构建法则网络，概念和测量如何与建立有效性相关的理论地图仍然是一个挑战。这一限制带来了实际后果：临床试验可能无法检测治疗效果，公共政策可能会针对错误的结果。我们介绍了潜在指标分析，以生成语法结构（ALIGNS），一个大型的语言模型为基础的系统，经过验证的问卷调查措施训练。ALIGNS提供了三个全面的法则网络，包含超过550，000个指标，涉及心理学，医学，社会政策和其他领域。这代表了大型语言模型的首次应用，以解决测量验证中的基础问题。我们报告用于开发模型的分类准确性测试，以及三个评估。在第一次评估中，广泛使用的NIH PROMIS焦虑和抑郁工具被证明收敛于情绪困扰的单一维度。第二次评估研究了儿童气质的措施，并确定了四个潜在的维度没有捕捉到目前的框架，并质疑一个现有的维度。第三个评估，适用性检查，聘请专家心理测量师谁评估系统的重要性，可访问性和适用性。ALIGNS可在nomologicalnetwork.org上免费获得，通过大规模的法理分析补充了传统的验证方法。
摘要：Psychological measurement is critical to many disciplines. Despite advances in measurement, building nomological networks, theoretical maps of how concepts and measures relate to establish validity, remains a challenge 70 years after Cronbach and Meehl proposed them as fundamental to validation. This limitation has practical consequences: clinical trials may fail to detect treatment effects, and public policy may target the wrong outcomes. We introduce Analysis of Latent Indicators to Generate Nomological Structures (ALIGNS), a large language model-based system trained with validated questionnaire measures. ALIGNS provides three comprehensive nomological networks containing over 550,000 indicators across psychology, medicine, social policy, and other fields. This represents the first application of large language models to solve a foundational problem in measurement validation. We report classification accuracy tests used to develop the model, as well as three evaluations. In the first evaluation, the widely used NIH PROMIS anxiety and depression instruments are shown to converge into a single dimension of emotional distress. The second evaluation examines child temperament measures and identifies four potential dimensions not captured by current frameworks, and questions one existing dimension. The third evaluation, an applicability check, engages expert psychometricians who assess the system's importance, accessibility, and suitability. ALIGNS is freely available at nomologicalnetwork.org, complementing traditional validation methods with large-scale nomological analysis.

【10】Generating Individual Travel Diaries Using Large Language Models Informed by Census and Land-Use Data
标题：使用受人口普查和土地利用数据影响的大型语言模型生成个人旅行日记
链接：https://arxiv.org/abs/2509.09710

作者：lrokh Amin, Devin Rhoads, Fatemeh Fakhrmoosavi, Nicholas E. Lownes, John N. Ivan
摘要：这项研究介绍了一个大语言模型（LLM）计划，用于生成个人旅行日记在基于代理的交通模型。虽然传统的方法依赖于大量的专有家庭旅行调查，但本研究中提出的方法从开源的美国社区调查（ACS）和智能位置数据库（SLD）数据中生成人物角色，然后通过直接提示合成日记。这项研究采用了一种新的一对一队列现实主义评分：四个指标（旅行计数评分，间隔评分，目的评分和模式评分）的复合物，对康涅狄格州全州交通研究（CSTS）日记进行了验证，在人口统计学变量之间进行了匹配。该验证利用詹森-香农散度来测量生成的日记和真实日记之间的分布相似性。当与在验证集上校准的经典方法生成的日志（负二项用于行程生成;多项Logit用于模式/目的）相比时，LLM生成的日志实现了相当的整体真实性（LLM平均值：0.485 vs. 0.455）。LLM擅长确定旅行目的，并表现出更大的一致性（更窄的现实主义分数分布），而经典模型导致旅行计数和活动持续时间的数值估计。综合验证证实了LLM的统计代表性（LLM平均值：0.612 vs. 0.435），证明了LLM的zero-shot可行性，并为未来的合成日记评价系统建立了日记真实性的可量化指标。
摘要：This study introduces a Large Language Model (LLM) scheme for generating individual travel diaries in agent-based transportation models. While traditional approaches rely on large quantities of proprietary household travel surveys, the method presented in this study generates personas stochastically from open-source American Community Survey (ACS) and Smart Location Database (SLD) data, then synthesizes diaries through direct prompting. This study features a novel one-to-cohort realism score: a composite of four metrics (Trip Count Score, Interval Score, Purpose Score, and Mode Score) validated against the Connecticut Statewide Transportation Study (CSTS) diaries, matched across demographic variables. The validation utilizes Jensen-Shannon Divergence to measure distributional similarities between generated and real diaries. When compared to diaries generated with classical methods (Negative Binomial for trip generation; Multinomial Logit for mode/purpose) calibrated on the validation set, LLM-generated diaries achieve comparable overall realism (LLM mean: 0.485 vs. 0.455). The LLM excels in determining trip purpose and demonstrates greater consistency (narrower realism score distribution), while classical models lead in numerical estimates of trip count and activity duration. Aggregate validation confirms the LLM's statistical representativeness (LLM mean: 0.612 vs. 0.435), demonstrating LLM's zero-shot viability and establishing a quantifiable metric of diary realism for future synthetic diary evaluation systems.

【11】Powering Job Search at Scale: LLM-Enhanced Query Understanding in Job Matching Systems
标题：大规模搜索工作：工作匹配系统中的LLM增强查询理解
链接：https://arxiv.org/abs/2509.09690

作者： Jianqiang Shen, Qianqi Shen, Chunnan Yao, Kevin Kao, Dan Xu, Rajat Arora, Baofen Zheng, Caleb Johnson, Liangjie Hong, Jingwei Wu, Wenjing Zhang
备注：CIKM2025
摘要：查询理解在现代相关系统中是必不可少的，在这些系统中，用户查询通常是简短的，模糊的，并且高度依赖于上下文。传统方法通常依赖于多个特定于任务的命名实体识别模型来提取结构化方面，如在求职应用中所见。然而，这种支离破碎的体系结构是脆弱的，维护成本高，并且适应不断发展的分类法和语言模式的速度很慢。在本文中，我们介绍了一个统一的查询理解框架，由大型语言模型（LLM），旨在解决这些限制。我们的方法联合建模用户查询和上下文信号，如配置文件属性，以生成结构化的解释，驱动更准确和个性化的建议。该框架提高了在线A/B测试的相关性质量，同时显著降低了系统复杂性和操作开销。结果表明，我们的解决方案提供了一个可扩展的和适应性强的基础，在动态Web应用程序的查询理解。
摘要：Query understanding is essential in modern relevance systems, where user queries are often short, ambiguous, and highly context-dependent. Traditional approaches often rely on multiple task-specific Named Entity Recognition models to extract structured facets as seen in job search applications. However, this fragmented architecture is brittle, expensive to maintain, and slow to adapt to evolving taxonomies and language patterns. In this paper, we introduce a unified query understanding framework powered by a Large Language Model (LLM), designed to address these limitations. Our approach jointly models the user query and contextual signals such as profile attributes to generate structured interpretations that drive more accurate and personalized recommendations. The framework improves relevance quality in online A/B testing while significantly reducing system complexity and operational overhead. The results demonstrate that our solution provides a scalable and adaptable foundation for query understanding in dynamic web applications.

【12】Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors
标题：参数内的角色：使用低级别适配器微调小型语言模型以模仿用户行为
链接：https://arxiv.org/abs/2509.09689

作者：Thakur, Eshani Agrawal, Smruthi Mukund
摘要：开发精确推荐模型的一个长期挑战是模拟用户行为，这主要是由于用户交互的复杂性和随机性。为此，一个有前途的工作是使用大型语言模型（LLM）来模拟用户行为。然而，将这些通用的大型预训练模型与用户偏好相匹配需要：（i）有效且连续地解析大规模表格式用户-项目交互数据，（ii）克服预训练引起的归纳偏差，以准确地学习用户特定的知识，以及（iii）为数百万用户大规模实现前两个。虽然大多数以前的作品都集中在复杂的方法来提示LLM或在表格交互数据集上对其进行微调，但我们的方法将重点转移到使用冻结的LLM提取强大的文本用户表示，并模拟由微调的小语言模型（SLM）提供支持的具有成本效益，资源效率高的用户代理。此外，我们还展示了一种为用户组或\textit{persona}训练多个低级别适配器的方法，在用户行为代理的可扩展性和性能之间取得最佳平衡。我们的实验提供了令人信服的经验证据，我们的方法的有效性，表明使用我们的方法开发的用户代理有可能弥合离线指标和现实世界的推荐系统的性能之间的差距。
摘要：A long-standing challenge in developing accurate recommendation models is simulating user behavior, mainly due to the complex and stochastic nature of user interactions. Towards this, one promising line of work has been the use of Large Language Models (LLMs) for simulating user behavior. However, aligning these general-purpose large pre-trained models with user preferences necessitates: (i) effectively and continously parsing large-scale tabular user-item interaction data, (ii) overcoming pre-training-induced inductive biases to accurately learn user specific knowledge, and (iii) achieving the former two at scale for millions of users. While most previous works have focused on complex methods to prompt an LLM or fine-tune it on tabular interaction datasets, our approach shifts the focus to extracting robust textual user representations using a frozen LLM and simulating cost-effective, resource-efficient user agents powered by fine-tuned Small Language Models (SLMs). Further, we showcase a method for training multiple low-rank adapters for groups of users or \textit{persona}, striking an optimal balance between scalability and performance of user behavior agents. Our experiments provide compelling empirical evidence of the efficacy of our methods, demonstrating that user agents developed using our approach have the potential to bridge the gap between offline metrics and real-world performance of recommender systems.

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】GraphCSVAE: Graph Categorical Structured Variational Autoencoder for Spatiotemporal Auditing of Physical Vulnerability Towards Sustainable Post-Disaster Risk Reduction
标题：GraphCSVAE：图分类结构化变分自动编码器，用于物理脆弱性的时空审计，以实现可持续的灾后风险降低
链接：https://arxiv.org/abs/2509.10308

作者：masaka, Christian Geiß, Robert Muir-Wood, Emily So
备注：Accepted full paper at the 8th International Disaster and Risk Conference, IDRC 2025 | Keywords: weakly supervised, graph deep learning, categorical distribution, physical vulnerability, remote sensing, spatiotemporal disaster risk, transition matrix | The data and code are respectively available at this https URL and this https URL
摘要：灾害发生后，全球许多机构在持续监测灾害风险变化方面面临挑战，这限制了关键决策者评估《2015-2030年联合国仙台减少灾害风险框架》进展情况的能力。虽然通过地球观测和数据驱动的方法，许多努力大大推进了危害和暴露的大规模建模，但在建模风险方程式中另一个同样重要但具有挑战性的因素方面进展仍然有限：物理脆弱性。为了解决这一差距，我们引入了图形分类结构变分自动编码器（GraphCSVAE），这是一种新型的概率数据驱动框架，用于通过集成深度学习，图形表示和分类概率推理，使用时间序列卫星衍生数据集和先验专家信念系统来建模物理脆弱性。我们引入了一个弱监督的一阶转换矩阵，反映了两个受灾和社会经济条件不利的地区的物理脆弱性的时空分布的变化：（1）在孟加拉国受飓风影响的沿海Khurushkul社区和（2）在塞拉利昂受泥石流影响的城市弗里敦。我们的工作揭示了灾后物理脆弱性的区域动态，为本地化的时空审计和灾后减少风险的可持续战略提供了宝贵的见解。
摘要：In the aftermath of disasters, many institutions worldwide face challenges in continually monitoring changes in disaster risk, limiting the ability of key decision-makers to assess progress towards the UN Sendai Framework for Disaster Risk Reduction 2015-2030. While numerous efforts have substantially advanced the large-scale modeling of hazard and exposure through Earth observation and data-driven methods, progress remains limited in modeling another equally important yet challenging element of the risk equation: physical vulnerability. To address this gap, we introduce Graph Categorical Structured Variational Autoencoder (GraphCSVAE), a novel probabilistic data-driven framework for modeling physical vulnerability by integrating deep learning, graph representation, and categorical probabilistic inference, using time-series satellite-derived datasets and prior expert belief systems. We introduce a weakly supervised first-order transition matrix that reflects the changes in the spatiotemporal distribution of physical vulnerability in two disaster-stricken and socioeconomically disadvantaged areas: (1) the cyclone-impacted coastal Khurushkul community in Bangladesh and (2) the mudslide-affected city of Freetown in Sierra Leone. Our work reveals post-disaster regional dynamics in physical vulnerability, offering valuable insights into localized spatiotemporal auditing and sustainable strategies for post-disaster risk reduction.

【2】HGEN: Heterogeneous Graph Ensemble Networks
标题：Hgen：异类图集合网络
链接：https://arxiv.org/abs/2509.09843

作者：en, Yufei Jin, Yi He, Xingquan Zhu
备注：The paper is in proceedings of the 34th IJCAI Conference, 2025
摘要：本文介绍了HGEN集成学习异构图的先驱。我们认为，异质性的节点类型，节点功能，和本地邻域拓扑结构集成学习，特别是在容纳不同的图学习者提出了重大挑战。我们的HGEN框架通过元路径和基于转换的优化管道集成多个学习器，以提高分类精度。具体来说，HGEN使用与随机丢弃相结合的元路径来创建等位基因图神经网络（GNN），从而训练和对齐基础图学习器以供稍后集成。为了确保有效的集成学习，HGEN提出了两个关键组件：1）剩余注意力机制，用于校准不同元路径的等位基因GNN，从而强制节点嵌入集中在更多信息图上，以提高基础学习者的准确性，以及2）相关正则化项，用于扩大从不同元路径生成的嵌入矩阵之间的差异，从而丰富基础学习者的多样性。我们分析了HGEN的收敛性，并证明了它比简单投票更高的正则化幅度。五个异构网络的实验验证，HGEN始终优于其国家的最先进的竞争对手的大幅利润。
摘要：This paper presents HGEN that pioneers ensemble learning for heterogeneous graphs. We argue that the heterogeneity in node types, nodal features, and local neighborhood topology poses significant challenges for ensemble learning, particularly in accommodating diverse graph learners. Our HGEN framework ensembles multiple learners through a meta-path and transformation-based optimization pipeline to uplift classification accuracy. Specifically, HGEN uses meta-path combined with random dropping to create Allele Graph Neural Networks (GNNs), whereby the base graph learners are trained and aligned for later ensembling. To ensure effective ensemble learning, HGEN presents two key components: 1) a residual-attention mechanism to calibrate allele GNNs of different meta-paths, thereby enforcing node embeddings to focus on more informative graphs to improve base learner accuracy, and 2) a correlation-regularization term to enlarge the disparity among embedding matrices generated from different meta-paths, thereby enriching base learner diversity. We analyze the convergence of HGEN and attest its higher regularization magnitude over simple voting. Experiments on five heterogeneous networks validate that HGEN consistently outperforms its state-of-the-art competitors by substantial margin.

【3】Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis
标题：结构很重要：通过可学习边缘掩蔽增强脑图以实现数据高效的精神病学诊断
链接：https://arxiv.org/abs/2509.09744

作者：, Chenze Wang, Liping Chen, Nguyen Linh Dan Le, Niharika Tewari, Ting Dang, Jiangang Ma, Feng Xia
摘要：标记的大脑网络数据的有限可用性使得实现准确和可解释的精神病诊断具有挑战性。虽然自监督学习（SSL）提供了一个很有前途的解决方案，但现有的方法通常依赖于增强策略，这些策略可能会破坏脑图中的关键结构语义。为了解决这个问题，我们提出了SAM-BG，这是一个两阶段的框架，用于学习具有结构语义保留的脑图表示。在预训练阶段，在小的标记子集上训练边缘掩蔽器以捕获关键结构语义。在SSL阶段，提取的结构先验指导结构感知增强过程，使模型能够学习更有语义意义和鲁棒的表示。在两个真实世界的精神病学数据集上的实验表明，SAM-BG优于最先进的方法，特别是在小标记数据设置中，并揭示了增强可解释性的临床相关连接模式。我们的代码可在https://github.com/mjliu99/SAM-BG上获得。
摘要：The limited availability of labeled brain network data makes it challenging to achieve accurate and interpretable psychiatric diagnoses. While self-supervised learning (SSL) offers a promising solution, existing methods often rely on augmentation strategies that can disrupt crucial structural semantics in brain graphs. To address this, we propose SAM-BG, a two-stage framework for learning brain graph representations with structural semantic preservation. In the pre-training stage, an edge masker is trained on a small labeled subset to capture key structural semantics. In the SSL stage, the extracted structural priors guide a structure-aware augmentation process, enabling the model to learn more semantically meaningful and robust representations. Experiments on two real-world psychiatric datasets demonstrate that SAM-BG outperforms state-of-the-art methods, particularly in small-labeled data settings, and uncovers clinically relevant connectivity patterns that enhance interpretability. Our code is available at https://github.com/mjliu99/SAM-BG.

【4】Why does your graph neural network fail on some graphs? Insights from exact generalisation error
标题：为什么您的图神经网络在某些图上失败？从精确概括错误中得到的见解
链接：https://arxiv.org/abs/2509.10337

作者：, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar
摘要：图神经网络（GNN）被广泛用于图结构数据的学习，但对它们成功或失败的原因仍然难以理解。虽然之前的工作已经研究了架构限制，如过度平滑和过度挤压，但这些并没有解释是什么使GNN能够提取有意义的表示，或者为什么性能在类似的架构之间差异巨大。这些问题与泛化的作用有关：模型对未标记数据进行准确预测的能力。虽然有几个作品已经推导出了GNN的泛化误差范围，但这些通常是松散的，仅限于单个架构，并且对实践中控制泛化的内容提供了有限的见解。在这项工作中，我们采取了一种不同的方法，通过信号处理的镜头在转导固定设计设置中推导GNN的精确泛化误差。从这个角度来看，GNN可以被解释为通过图结构作用于节点特征的图过滤器运算符。通过专注于线性GNN，同时允许图过滤器中的非线性，我们推导出了广泛GNN的第一个精确泛化误差，包括卷积，基于PageRank和基于注意力的模型。概括错误的确切表征表明，只有节点特征和图结构之间的对齐信息有助于概括。此外，我们量化同质性对泛化的影响。我们的工作提供了一个框架，解释了GNN何时以及为什么可以有效地利用结构和特征信息，为模型选择提供了实用的指导。
摘要：Graph Neural Networks (GNNs) are widely used in learning on graph-structured data, yet a principled understanding of why they succeed or fail remains elusive. While prior works have examined architectural limitations such as over-smoothing and over-squashing, these do not explain what enables GNNs to extract meaningful representations or why performance varies drastically between similar architectures. These questions are related to the role of generalisation: the ability of a model to make accurate predictions on unlabelled data. Although several works have derived generalisation error bounds for GNNs, these are typically loose, restricted to a single architecture, and offer limited insight into what governs generalisation in practice. In this work, we take a different approach by deriving the exact generalisation error for GNNs in a transductive fixed-design setting through the lens of signal processing. From this viewpoint, GNNs can be interpreted as graph filter operators that act on node features via the graph structure. By focusing on linear GNNs while allowing non-linearity in the graph filters, we derive the first exact generalisation error for a broad range of GNNs, including convolutional, PageRank-based, and attention-based models. The exact characterisation of the generalisation error reveals that only the aligned information between node features and graph structure contributes to generalisation. Furthermore, we quantify the effect of homophily on generalisation. Our work provides a framework that explains when and why GNNs can effectively leverage structural and feature information, offering practical guidance for model selection.

Transformer(3篇)

【1】WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers
标题：WhisTLE：预训练语音识别转换器的深度监督、纯文本域自适应
链接：https://arxiv.org/abs/2509.10452

作者：ndey, Karun Kumar, Raphael Tang
备注：5 pages, 2 figures
摘要：预训练的自动语音识别（ASR）模型（如Whisper）表现良好，但仍然需要进行领域调整以处理看不见的词汇和用语。在许多现实世界的设置中，收集语音数据是不切实际的，需要仅文本适配。我们提出了Whistle，这是一种针对预训练编码器-解码器ASR模型的深度监督的纯文本自适应方法。Whistle训练一个变分自动编码器（VAE）来模拟来自文本的编码器输出，并使用学习的文本到潜在编码器微调解码器，可选地结合文本到语音（TTS）自适应。在推理时，原始编码器被恢复，不会产生额外的运行时成本。在四个域外数据集和四个ASR模型中，带有TTS的WhisTLE相对于仅TTS自适应将字错误率（WER）降低了12.3%，并且在32个场景中的27个场景中优于所有非WhisTLE基线。
摘要：Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen vocabulary and parlance. In many real-world settings, collecting speech data is impractical, necessitating text-only adaptation. We propose WhisTLE, a deeply supervised, text-only adaptation method for pretrained encoder-decoder ASR models. WhisTLE trains a variational autoencoder (VAE) to model encoder outputs from text and fine-tunes the decoder using the learned text-to-latent encoder, optionally combined with text-to-speech (TTS) adaptation. At inference, the original encoder is restored, incurring no extra runtime cost. Across four out-of-domain datasets and four ASR models, WhisTLE with TTS reduces word error rate (WER) by 12.3% relative to TTS-only adaptation and outperforms all non-WhisTLE baselines in 27 of 32 scenarios.

【2】I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation
标题：I-Segmenter：用于高效语义分割的纯积分视觉Transformer
链接：https://arxiv.org/abs/2509.10334

作者：ssoon, Michal Szczepanski, Martyna Poreba
摘要：Vision Transformers（ViTs）最近在语义分割方面取得了很好的效果，但由于其高内存占用和计算成本，它们在资源受限设备上的部署仍然有限。量化提供了一种有效的策略来提高效率，但基于ViT的分割模型在低精度下是众所周知的脆弱，因为量化误差在深度编码器-解码器流水线上累积。我们介绍了I-Segmenter，第一个完全仅限整数的ViT分割框架。基于Segmenter架构，I-Segmenter系统性地将浮点运算替换为纯整数运算。为了进一步稳定训练和推理，我们提出了$\lambda$-ShiftGELU，这是一种新的激活函数，可以减轻均匀量化在处理长尾激活分布时的局限性。此外，我们删除了L2归一化层，并将解码器中的双线性插值替换为最近邻上采样，确保整个计算图仅执行整数。广泛的实验表明，I-Segmenter在FP 32基线的合理范围内实现了准确性（平均5.1%），同时将模型大小减少了3.8倍，并通过优化的运行时间实现了高达1.2倍的推理速度。值得注意的是，即使在使用单个校准图像的一次性PTQ中，I-Segmenter也能提供具有竞争力的准确性，突出了其在实际部署中的实用性。
摘要：Vision Transformers (ViTs) have recently achieved strong results in semantic segmentation, yet their deployment on resource-constrained devices remains limited due to their high memory footprint and computational cost. Quantization offers an effective strategy to improve efficiency, but ViT-based segmentation models are notoriously fragile under low precision, as quantization errors accumulate across deep encoder-decoder pipelines. We introduce I-Segmenter, the first fully integer-only ViT segmentation framework. Building on the Segmenter architecture, I-Segmenter systematically replaces floating-point operations with integer-only counterparts. To further stabilize both training and inference, we propose $\lambda$-ShiftGELU, a novel activation function that mitigates the limitations of uniform quantization in handling long-tailed activation distributions. In addition, we remove the L2 normalization layer and replace bilinear interpolation in the decoder with nearest neighbor upsampling, ensuring integer-only execution throughout the computational graph. Extensive experiments show that I-Segmenter achieves accuracy within a reasonable margin of its FP32 baseline (5.1 % on average), while reducing model size by up to 3.8x and enabling up to 1.2x faster inference with optimized runtimes. Notably, even in one-shot PTQ with a single calibration image, I-Segmenter delivers competitive accuracy, underscoring its practicality for real-world deployment.

【3】Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge
标题：边缘高效Transformer语义通信的自适应令牌合并
链接：https://arxiv.org/abs/2509.09955

作者：, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat
备注：Submitted to IEEE Journals
摘要：大规模Transformers是现代语义通信的核心，但它们的高计算和通信成本阻碍了在资源受限的边缘设备上的部署。本文介绍了一种无需训练的自适应令牌合并框架，一种新的机制，压缩Transformer表示在运行时有选择地合并语义冗余令牌下每层相似性阈值。与之前的固定比率减少不同，我们的方法将合并直接耦合到输入冗余，从而实现数据依赖的适应，平衡效率和任务相关性，而无需重新训练。我们投合并策略的发现作为一个多目标优化问题，并利用贝叶斯优化，以获得帕累托最优的准确性，推理成本和通信成本之间的权衡。在ImageNet分类上，我们匹配未经修改的Transformer的准确性，每秒浮点运算减少30\%，并且低于原始通信成本的20\%，而对于视觉问题回答，我们的方法在不到三分之一的计算和十分之一的带宽下实现了与完整LLaVA模型竞争的性能。最后，我们证明了我们的自适应合并在不同的信道条件下是鲁棒的，并提供了固有的隐私优势，大大降低了模型反演攻击的效率。我们的框架为在资源有限的边缘智能场景中部署功能强大的Transformer模型提供了实用且通用的解决方案。
摘要：Large-scale transformers are central to modern semantic communication, yet their high computational and communication costs hinder deployment on resource-constrained edge devices. This paper introduces a training-free framework for adaptive token merging, a novel mechanism that compresses transformer representations at runtime by selectively merging semantically redundant tokens under per-layer similarity thresholds. Unlike prior fixed-ratio reduction, our approach couples merging directly to input redundancy, enabling data-dependent adaptation that balances efficiency and task relevance without retraining. We cast the discovery of merging strategies as a multi-objective optimization problem and leverage Bayesian optimization to obtain Pareto-optimal trade-offs between accuracy, inference cost, and communication cost. On ImageNet classification, we match the accuracy of the unmodified transformer with 30\% fewer floating-point operations per second and under 20\% of the original communication cost, while for visual question answering our method achieves performance competitive with the full LLaVA model at less than one-third of the compute and one-tenth of the bandwidth. Finally, we show that our adaptive merging is robust across varying channel conditions and provides inherent privacy benefits, substantially degrading the efficacy of model inversion attacks. Our framework provides a practical and versatile solution for deploying powerful transformer models in resource-limited edge intelligence scenarios.

GAN|对抗|攻击|生成相关(4篇)

【1】Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes
标题：有限引用、可靠生成：低数据机制中用于表格数据生成的两组件框架
链接：https://arxiv.org/abs/2509.09960

作者：Jiang, Yongxin Wang, Ziyue Dai, Yicun Liu, Hongyi Nie, Sen Liu, Hongfeng Chai
摘要：合成表格数据生成在数据管理中越来越重要，当真实世界和高质量的表格数据不足时，它可以支持下游应用程序。现有的表格生成方法，如生成对抗网络（GANs），扩散模型和微调的大型语言模型（LLM），通常需要足够的参考数据，限制了它们在特定领域数据库中的有效性。虽然基于数据库的LLM提供了无需参数调整的灵活性，但它们通常无法捕获特定于数据库的功能标签依赖关系并生成冗余数据，从而导致下游任务性能下降。为了克服这些问题，我们提出了ReFine，一个框架，（i）从可解释的模型中导出符号化的“if-then”规则，并将其嵌入到提示中，以明确地引导生成特定于域的特征分布，以及（ii）应用双粒度过滤策略，抑制过采样模式，并选择性地细化稀有但信息丰富的样本，以减少分布不平衡。在各种回归和分类基准上进行的大量实验表明，ReFine始终优于最先进的方法，回归的R平方绝对提高了0.44，分类任务的F1得分相对提高了10.0%。
摘要：Synthetic tabular data generation is increasingly essential in data management, supporting downstream applications when real-world and high-quality tabular data is insufficient. Existing tabular generation approaches, such as generative adversarial networks (GANs), diffusion models, and fine-tuned Large Language Models (LLMs), typically require sufficient reference data, limiting their effectiveness in domain-specific databases with scarce records. While prompt-based LLMs offer flexibility without parameter tuning, they often fail to capture dataset-specific feature-label dependencies and generate redundant data, leading to degradation in downstream task performance. To overcome these issues, we propose ReFine, a framework that (i) derives symbolic "if-then" rules from interpretable models and embeds them into prompts to explicitly guide generation toward domain-specific feature distribution, and (ii) applies a dual-granularity filtering strategy that suppresses over-sampling patterns and selectively refines rare but informative samples to reduce distributional imbalance. Extensive experiments on various regression and classification benchmarks demonstrate that ReFine consistently outperforms state-of-the-art methods, achieving up to 0.44 absolute improvement in R-squared for regression and 10.0 percent relative improvement in F1 score for classification tasks.

【2】DyKen-Hyena: Dynamic Kernel Generation via Cross-Modal Attention for Multimodal Intent Recognition
标题：DyKen-Hyena：通过跨模式注意力动态核心生成用于多模式意图识别
链接：https://arxiv.org/abs/2509.09940

作者：g, Wenbin Wang, Yong Luo
备注：8 pages, 2 figures
摘要：虽然多模态意图识别（MIR）通过利用来自多个源的丰富信息（例如，语言、视频和音频），跨模态的意图无关和冲突信息的可能性可能阻碍性能的进一步改进。目前大多数模型试图通过将多头注意力等机制应用于单峰特征序列，然后将结果添加回原始表示来融合模态。这个过程有可能用嘈杂或不相关的非语言信号破坏主要的语言特征，因为它往往无法捕捉到非语言线索应该调节而不仅仅是增强文本意义的细粒度的标记级影响。为了解决这个问题，我们介绍了DyKen-Hyena，它重新定义了从特征融合到处理调制的问题。我们的模型将视听线索转换为动态的，每个令牌的卷积内核，直接调制文本特征提取。这种细粒度的方法在MIntRec和MIntRec2.0基准测试中获得了最先进的结果。值得注意的是，它在范围外检测中产生了+10.46%F1分数的提高，验证了我们的方法创建了一个从根本上更强大的意图表示。
摘要：Though Multimodal Intent Recognition (MIR) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential for intent-irrelevant and conflicting information across modalities may hinder performance from being further improved. Most current models attempt to fuse modalities by applying mechanisms like multi-head attention to unimodal feature sequences and then adding the result back to the original representation. This process risks corrupting the primary linguistic features with noisy or irrelevant non-verbal signals, as it often fails to capture the fine-grained, token-level influence where non-verbal cues should modulate, not just augment, textual meaning. To address this, we introduce DyKen-Hyena, which reframes the problem from feature fusion to processing modulation. Our model translates audio-visual cues into dynamic, per-token convolutional kernels that directly modulate textual feature extraction. This fine-grained approach achieves state-of-the-art results on the MIntRec and MIntRec2.0 benchmarks. Notably, it yields a +10.46% F1-score improvement in out-of-scope detection, validating that our method creates a fundamentally more robust intent representation.

【3】A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes
标题：城市建筑能源数据的模块化和多模式生成人工智能框架：生成合成住宅
链接：https://arxiv.org/abs/2509.09794

作者：shbaugh, Chetan Tiwari, Jorge Silveyra
备注：44 pages; 2 appendices; 9 figures; 1 table. Code available at this https URL
摘要：计算模型已经成为能源建模研究的强大工具，吹捧可扩展性和定量结果。然而，这些模型需要大量的数据，其中一些是不可访问的，昂贵的，或引起隐私问题。我们引入了一个模块化的多模态框架，使用生成式人工智能（AI）从公开访问的住宅信息和图像中生成这些数据。此外，我们还提供了一个展示该框架的管道，并评估了其生成AI组件。我们的实验表明，我们的框架使用AI避免了生成模型的常见问题。我们的框架产生了现实的，标记的数据。通过减少对昂贵或受限数据源的依赖，我们为更容易获得和可复制的研究铺平了道路。
摘要：Computational models have emerged as powerful tools for energy modeling research, touting scalability and quantitative results. However, these models require a plethora of data, some of which is inaccessible, expensive, or raises privacy concerns. We introduce a modular multimodal framework to produce this data from publicly accessible residential information and images using generative artificial intelligence (AI). Additionally, we provide a pipeline demonstrating this framework, and we evaluate its generative AI components. Our experiments show that our framework's use of AI avoids common issues with generative models. Our framework produces realistic, labeled data. By reducing dependence on costly or restricted data sources, we pave a path towards more accessible and reproducible research.

【4】Testing chatbots on the creation of encoders for audio conditioned image generation
标题：测试聊天机器人创建音频调节图像生成编码器
链接：https://arxiv.org/abs/2509.09717

作者：León, Miguel Carrasco
摘要：一方面，聊天机器人的最新进展导致使用这些模型进行编码任务越来越受欢迎。另一方面，现代生成图像模型主要依赖于文本编码器将语义概念转换为视觉表示，即使有明确的证据表明音频也可以用作输入。考虑到前面的内容，在这项工作中，我们将探索最先进的会话代理是否可以设计有效的音频编码器来取代Stable Diffusion 1.5中的CLIP文本编码器，从而直接从声音合成图像。我们促使五个公开可用的聊天机器人提出神经架构作为这些音频编码器，并提供一组解释良好的共享条件。每个有效的建议编码器都经过了超过200万个与上下文相关的音频-图像-文本观察的训练，并使用各种度量标准在验证和测试集上进行了评估，同时对其生成的图像进行了定性分析。尽管几乎所有聊天机器人都生成了有效的模型设计，但没有一个取得了令人满意的结果，这表明它们的音频嵌入未能与原始文本编码器的音频嵌入可靠地对齐。在这些提案中，Gemini音频编码器显示了最好的量化指标，而Grok音频编码器产生了更连贯的图像（特别是当与文本编码器配对时）。我们的研究结果揭示了聊天机器人之间共享的架构偏见，并强调了需要在这些模型的未来版本中弥合的剩余编码差距。我们还创建了一个公开演示，以便每个人都可以学习和尝试这些音频编码器。最后，我们提出了未来应该解决的研究问题，并鼓励其他研究人员执行像这样更集中和高度专业化的任务，因此相应的聊天机器人无法使用众所周知的解决方案，并且他们的创造力/推理得到了充分的测试。
摘要：On one hand, recent advances in chatbots has led to a rising popularity in using these models for coding tasks. On the other hand, modern generative image models primarily rely on text encoders to translate semantic concepts into visual representations, even when there is clear evidence that audio can be employed as input as well. Given the previous, in this work, we explore whether state-of-the-art conversational agents can design effective audio encoders to replace the CLIP text encoder from Stable Diffusion 1.5, enabling image synthesis directly from sound. We prompted five publicly available chatbots to propose neural architectures to work as these audio encoders, with a set of well-explained shared conditions. Each valid suggested encoder was trained on over two million context related audio-image-text observations, and evaluated on held-out validation and test sets using various metrics, together with a qualitative analysis of their generated images. Although almost all chatbots generated valid model designs, none achieved satisfactory results, indicating that their audio embeddings failed to align reliably with those of the original text encoder. Among the proposals, the Gemini audio encoder showed the best quantitative metrics, while the Grok audio encoder produced more coherent images (particularly, when paired with the text encoder). Our findings reveal a shared architectural bias across chatbots and underscore the remaining coding gap that needs to be bridged in future versions of these models. We also created a public demo so everyone could study and try out these audio encoders. Finally, we propose research questions that should be tackled in the future, and encourage other researchers to perform more focused and highly specialized tasks like this one, so the respective chatbots cannot make use of well-known solutions and their creativity/reasoning is fully tested.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets
标题：SSL-AD：时空自我监督学习，以实现阿尔茨海默氏症预测任务和数据集的概括性和适应性
链接：https://arxiv.org/abs/2509.10453

作者：zmarek, Justin Szeto, Brennan Nichyporuk, Tal Arbel
摘要：阿尔茨海默病是一种渐进性神经退行性疾病，可导致记忆丧失和认知能力下降。虽然在将深度学习模型应用于阿尔茨海默氏症预测任务方面已经有了广泛的研究，但这些模型仍然受到缺乏可用标记数据、跨数据集泛化能力差以及对不同数量的输入扫描和扫描之间的时间间隔的适应性的限制。在这项研究中，我们采用了三种最先进的时间自监督学习（SSL）方法进行3D大脑MRI分析，并添加了旨在处理可变长度输入和学习鲁棒空间特征的新扩展。我们汇总了四个公开可用的数据集，包括3，161名患者进行预训练，并显示了我们的模型在多个阿尔茨海默氏症预测任务中的性能，包括诊断分类，转换检测和未来转换预测。重要的是，我们使用时序预测和对比学习实现的SSL模型在七个下游任务中的六个上优于监督学习。它展示了在不同时间间隔的任务和输入图像数量之间的适应性和通用性，突出了其在临床应用中的强大性能。我们在https://github.com/emilykaczmarek/SSL-AD公开发布我们的代码和模型。
摘要：Alzheimer's disease is a progressive, neurodegenerative disorder that causes memory loss and cognitive decline. While there has been extensive research in applying deep learning models to Alzheimer's prediction tasks, these models remain limited by lack of available labeled data, poor generalization across datasets, and inflexibility to varying numbers of input scans and time intervals between scans. In this study, we adapt three state-of-the-art temporal self-supervised learning (SSL) approaches for 3D brain MRI analysis, and add novel extensions designed to handle variable-length inputs and learn robust spatial features. We aggregate four publicly available datasets comprising 3,161 patients for pre-training, and show the performance of our model across multiple Alzheimer's prediction tasks including diagnosis classification, conversion detection, and future conversion prediction. Importantly, our SSL model implemented with temporal order prediction and contrastive learning outperforms supervised learning on six out of seven downstream tasks. It demonstrates adaptability and generalizability across tasks and number of input images with varying time intervals, highlighting its capacity for robust performance across clinical applications. We release our code and model publicly at https://github.com/emilykaczmarek/SSL-AD.

【2】Vendi Information Gain for Active Learning and its Application to Ecology
标题：Vendi主动学习的信息收益及其在生态学中的应用
链接：https://arxiv.org/abs/2509.10390

作者：en, Adji Bousso Dieng
摘要：虽然通过相机陷阱监测生物多样性已成为生态研究的一项重要工作，但由于标签资源有限，在捕获的图像数据中识别物种仍然是一个主要瓶颈。主动学习是一种机器学习范式，它选择信息量最大的数据来标记和训练预测模型，它提供了一种很有前途的解决方案，但通常专注于单个预测中的不确定性，而不考虑整个数据集的不确定性。我们引入了一种新的主动学习策略，Vendi信息增益（VIG），该策略根据图像对全网预测不确定性的影响来选择图像，同时捕获信息量和多样性。应用于快照塞伦盖蒂数据集，VIG使用不到10%的标签实现了令人印象深刻的预测准确性，接近完全监督。它在指标和批量大小方面始终优于标准基线，在功能空间中收集更多样化的数据。VIG在生态学之外具有广泛的适用性，我们的研究结果突出了其在数据有限的环境中进行生物多样性监测的价值。
摘要：While monitoring biodiversity through camera traps has become an important endeavor for ecological research, identifying species in the captured image data remains a major bottleneck due to limited labeling resources. Active learning -- a machine learning paradigm that selects the most informative data to label and train a predictive model -- offers a promising solution, but typically focuses on uncertainty in the individual predictions without considering uncertainty across the entire dataset. We introduce a new active learning policy, Vendi information gain (VIG), that selects images based on their impact on dataset-wide prediction uncertainty, capturing both informativeness and diversity. Applied to the Snapshot Serengeti dataset, VIG achieves impressive predictive accuracy close to full supervision using less than 10% of the labels. It consistently outperforms standard baselines across metrics and batch sizes, collecting more diverse data in the feature space. VIG has broad applicability beyond ecology, and our results highlight its value for biodiversity monitoring in data-limited environments.

【3】Uncertainty-Aware Tabular Prediction: Evaluating VBLL-Enhanced TabPFN in Safety-Critical Medical Data
标题：具有不确定性的表格预测：在安全关键医疗数据中评估VB LL增强的TabPFN
链接：https://arxiv.org/abs/2509.10048

作者： Ramalingam
摘要：预测模型正越来越多地用于广泛的领域，包括医疗诊断和刑事司法等安全关键型应用。在这种情况下，可靠的不确定性估计是一项至关重要的任务。Tabular Prior-data Fitted Network（TabPFN）是近年来提出的一种基于生成式Transformer结构的表格数据集机器学习基础模型。变分贝叶斯最后层（VBLL）是一种最先进的轻量级变分公式，有效地提高了不确定性估计与最小的计算开销。在这项工作中，我们的目标是评估的VBLL集成最近提出的TabPFN在不确定度校准的性能。我们的实验，进行了三个基准医疗表格数据集，比较原始TabPFN和VBLL集成版本的性能。与预期相反，我们观察到原始TabPFN在所有数据集的不确定性校准中始终优于VBLL集成TabPFN。
摘要：Predictive models are being increasingly used across a wide range of domains, including safety-critical applications such as medical diagnosis and criminal justice. Reliable uncertainty estimation is a crucial task in such settings. Tabular Prior-data Fitted Network (TabPFN) is a recently proposed machine learning foundation model for tabular dataset, which uses a generative transformer architecture. Variational Bayesian Last Layers (VBLL) is a state-of-the-art lightweight variational formulation that effectively improves uncertainty estimation with minimal computational overhead. In this work we aim to evaluate the performance of VBLL integrated with the recently proposed TabPFN in uncertainty calibration. Our experiments, conducted on three benchmark medical tabular datasets, compare the performance of the original TabPFN and the VBLL-integrated version. Contrary to expectations, we observed that original TabPFN consistently outperforms VBLL integrated TabPFN in uncertainty calibration across all datasets.

【4】Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
标题：通过稀疏专家混合中的无监督训练探索专家专业化
链接：https://arxiv.org/abs/2509.10025

作者： Nikolic, Ilker Oguz, Demetri Psaltis
备注：14 pages, 7 figures
摘要：理解神经网络的内部组织仍然是深度学习可解释性的一个根本挑战。我们通过探索一种新的稀疏混合专家变分自动编码器（SMoE-VAE）架构来解决这一挑战。我们在QuickDraw数据集上测试了我们的模型，将无监督的专家路由与由地面实况标签指导的监督基线进行了比较。令人惊讶的是，我们发现，无监督路由始终实现卓越的重建性能。专家们学习识别有意义的子类别结构，这些结构往往超越了人类定义的类别边界。通过t-SNE可视化和重建分析，我们研究了MoE模型如何揭示与模型目标比预定义标签更一致的基本数据结构。此外，我们对数据集大小的影响的研究提供了洞察数据量和专家专业化之间的权衡，为设计高效的MoE架构提供指导。
摘要：Understanding the internal organization of neural networks remains a fundamental challenge in deep learning interpretability. We address this challenge by exploring a novel Sparse Mixture of Experts Variational Autoencoder (SMoE-VAE) architecture. We test our model on the QuickDraw dataset, comparing unsupervised expert routing against a supervised baseline guided by ground-truth labels. Surprisingly, we find that unsupervised routing consistently achieves superior reconstruction performance. The experts learn to identify meaningful sub-categorical structures that often transcend human-defined class boundaries. Through t-SNE visualizations and reconstruction analysis, we investigate how MoE models uncover fundamental data structures that are more aligned with the model's objective than predefined labels. Furthermore, our study on the impact of dataset size provides insights into the trade-offs between data quantity and expert specialization, offering guidance for designing efficient MoE architectures.

【5】LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
标题：LoFT：开放世界场景中长尾半监督学习的参数有效微调
链接：https://arxiv.org/abs/2509.09926

作者：en, Zhiyuan Huang, Yurou Liu, Bing Su
摘要：长尾学习由于其在现实世界中的广泛适用性而受到越来越多的关注。在现有的方法中，长尾半监督学习（LTSSL）已经成为一种有效的解决方案，通过将大量未标记的数据纳入不平衡的标记数据集。然而，大多数现有的LTSSL方法被设计为从头开始训练模型，这通常会导致过度自信和低质量伪标签等问题。为了解决这些挑战，我们将LTSSL扩展到基础模型微调范式中，并提出了一个新的框架：LoFT（通过参数有效微调的长尾半监督学习）。我们证明，微调的基础模型可以产生更可靠的伪标签，从而有利于不平衡学习。此外，我们通过研究开放世界条件下的半监督学习来探索更实用的设置，其中未标记的数据可能包括分布外（OOD）样本。为了解决这个问题，我们提出了LoFT-OW（开放世界场景下的LoFT）来提高区分能力。多个基准测试的实验结果表明，我们的方法实现了优越的性能相比，以前的方法，即使利用只有1%的未标记的数据相比，以前的作品。
摘要：Long-tailed learning has garnered increasing attention due to its wide applicability in real-world scenarios. Among existing approaches, Long-Tailed Semi-Supervised Learning (LTSSL) has emerged as an effective solution by incorporating a large amount of unlabeled data into the imbalanced labeled dataset. However, most prior LTSSL methods are designed to train models from scratch, which often leads to issues such as overconfidence and low-quality pseudo-labels. To address these challenges, we extend LTSSL into the foundation model fine-tuning paradigm and propose a novel framework: LoFT (Long-tailed semi-supervised learning via parameter-efficient Fine-Tuning). We demonstrate that fine-tuned foundation models can generate more reliable pseudolabels, thereby benefiting imbalanced learning. Furthermore, we explore a more practical setting by investigating semi-supervised learning under open-world conditions, where the unlabeled data may include out-of-distribution (OOD) samples. To handle this problem, we propose LoFT-OW (LoFT under Open-World scenarios) to improve the discriminative ability. Experimental results on multiple benchmarks demonstrate that our method achieves superior performance compared to previous approaches, even when utilizing only 1\% of the unlabeled data compared with previous works.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】MCL-AD: Multimodal Collaboration Learning for Zero-Shot 3D Anomaly Detection
标题：MCL-AD：用于Zero-Shot3D异常检测的多模式协作学习
链接：https://arxiv.org/abs/2509.10282

作者：Tianjiao Chen, Mingle Zhou, Min Li, Delong Han, Jin Wan
备注：Page 14, 5 pictures
摘要：Zero-shot 3D（ZS-3D）异常检测旨在识别3D对象中的缺陷，而不依赖于标记的训练数据，这使得它在受数据稀缺，隐私或高注释成本限制的场景中特别有价值。然而，大多数现有的方法只关注点云，忽略了丰富的语义线索，可从补充形式，如RGB图像和文本先验。本文介绍了MCL-AD，一种新的框架，利用跨点云，RGB图像和文本语义的多模态协作学习，以实现卓越的zero-shot 3D异常检测。具体来说，我们提出了一种多模态提示学习机制（MPLM），通过引入对象不可知的解耦文本提示和多模态对比损失来增强模态内表示能力和模态间协作学习。此外，提出了一种协作调制机制（CMM），通过联合调制RGB图像引导和点云引导分支，充分利用点云和RGB图像的互补表示。大量的实验表明，所提出的MCL-AD框架在ZS-3D异常检测中达到了最先进的性能。
摘要：Zero-shot 3D (ZS-3D) anomaly detection aims to identify defects in 3D objects without relying on labeled training data, making it especially valuable in scenarios constrained by data scarcity, privacy, or high annotation cost. However, most existing methods focus exclusively on point clouds, neglecting the rich semantic cues available from complementary modalities such as RGB images and texts priors. This paper introduces MCL-AD, a novel framework that leverages multimodal collaboration learning across point clouds, RGB images, and texts semantics to achieve superior zero-shot 3D anomaly detection. Specifically, we propose a Multimodal Prompt Learning Mechanism (MPLM) that enhances the intra-modal representation capability and inter-modal collaborative learning by introducing an object-agnostic decoupled text prompt and a multimodal contrastive loss. In addition, a collaborative modulation mechanism (CMM) is proposed to fully leverage the complementary representations of point clouds and RGB images by jointly modulating the RGB image-guided and point cloud-guided branches. Extensive experiments demonstrate that the proposed MCL-AD framework achieves state-of-the-art performance in ZS-3D anomaly detection.

【2】Property prediction for ionic liquids without prior structural knowledge using limited experimental data: A data-driven neural recommender system leveraging transfer learning
标题：使用有限的实验数据在没有先验结构知识的情况下预测离子液体的性质：利用迁移学习的数据驱动神经推荐系统
链接：https://arxiv.org/abs/2509.10273

作者：hi, Kai Sundmacher, Caroline Ganzer
摘要：离子液体（IL）已经成为传统溶剂的通用替代品，因为它们的物理化学性质可以精确地适应各种应用。然而，由于巨大的化学设计空间和有限的实验数据，准确预测关键的热物理性质仍然具有挑战性。在这项研究中，我们提出了一个数据驱动的迁移学习框架，该框架利用神经推荐系统（NRS），使用稀疏的实验数据集对IL进行可靠的属性预测。该方法涉及两个阶段的过程：首先，在固定的温度和压力下，在基于COSMO-RS的模拟数据上预训练NRS模型，以学习阳离子和阴离子的特定属性结构嵌入;其次，在不同的温度和压力下，使用这些嵌入与实验数据微调简单的前馈神经网络。在这项工作中，被认为是五个重要的IL属性：密度，粘度，表面张力，热容，熔点。该框架支持酒店内和跨酒店的知识转移。值得注意的是，针对密度、粘度和热容的预训练模型用于微调所有五个目标属性的模型，从而大幅提高其中四个属性的性能。该模型表现出强大的外推到以前看不见的IL。此外，最终的训练模型能够预测超过700，000种IL组合的特性，为工艺设计中的IL筛选提供了可扩展的解决方案。这项工作突出了结合模拟数据和迁移学习来克服实验数据稀疏性的有效性。
摘要：Ionic liquids (ILs) have emerged as versatile replacements for traditional solvents because their physicochemical properties can be precisely tailored to various applications. However, accurately predicting key thermophysical properties remains challenging due to the vast chemical design space and the limited availability of experimental data. In this study, we present a data-driven transfer learning framework that leverages a neural recommender system (NRS) to enable reliable property prediction for ILs using sparse experimental datasets. The approach involves a two-stage process: first, pre-training NRS models on COSMO-RS-based simulated data at fixed temperature and pressure to learn property-specific structural embeddings for cations and anions; and second, fine-tuning simple feedforward neural networks using these embeddings with experimental data at varying temperatures and pressures. In this work, five essential IL properties are considered: density, viscosity, surface tension, heat capacity, and melting point. The framework supports both within-property and cross-property knowledge transfer. Notably, pre-trained models for density, viscosity, and heat capacity are used to fine-tune models for all five target properties, achieving improved performance by a substantial margin for four of them. The model exhibits robust extrapolation to previously unseen ILs. Moreover, the final trained models enable property prediction for over 700,000 IL combinations, offering a scalable solution for IL screening in process design. This work highlights the effectiveness of combining simulated data and transfer learning to overcome sparsity in the experimental data.

【3】Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
标题：用于改进Few-Shot音频分类的原型对比学习
链接：https://arxiv.org/abs/2509.10074

作者：Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis, Theodoros Giannakopoulos
备注：Accepted and Presented at IEEE International Workshop on Machine Learning for Signal Processing, Aug.\ 31-- Sep.\ 3, 2025, Istanbul, Turkey , 6 pages, 2 figures, 1 table
摘要：Few-Shot学习已经成为训练具有有限标记数据的模型的强大范例，解决了大规模注释不切实际的情况下的挑战。虽然在图像领域已经进行了广泛的研究，但音频分类中的Few-Shot学习仍然相对不足。在这项工作中，我们研究了将监督对比度损失集成到用于音频分类的原型Few Shot训练中的效果。详细地，我们证明了角损耗进一步提高了性能相比，标准的对比度损失。我们的方法利用SpecAugment，其次是一个自我注意机制，以封装不同的信息增强输入版本到一个统一的嵌入。我们在MetaAudio上评估我们的方法，MetaAudio是一个基准测试，包括五个具有预定义分割的数据集，标准化预处理和一组全面的Few-Shot学习模型进行比较。所提出的方法在5路、5次拍摄设置中实现了最先进的性能。
摘要：Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.

【4】Hybrid Adaptive Conformal Offline Reinforcement Learning for Fair Population Health Management
标题：公平人口健康管理的混合自适应共形离线强化学习
链接：https://arxiv.org/abs/2509.09772

作者：su, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji
备注：10 pages, 5 figures, 4 tables
摘要：医疗补助人群的人群健康管理计划协调纵向外展和服务（例如，有益于导航、行为健康、社会需求支持和临床调度），并且必须安全、公平和可审计。我们提出了一个混合自适应共形离线强化学习（HACO）框架，该框架将风险校准与偏好优化分开，以大规模生成保守的行动建议。在我们的设置中，每一步都涉及在共同的协调行动中进行选择（例如，联系哪个成员、通过哪种方式、以及是否路由到专门服务），同时控制不利使用事件的近期风险（例如，计划外的急诊室就诊或住院）。使用来自Waymark的去识别操作数据集，包括168，126名患者的277万个顺序决策，HACO（i）训练不良事件的轻量级风险模型，（ii）导出适形阈值以掩盖目标风险水平下的不安全行为，以及（iii）学习所得安全子集的偏好策略。我们使用版本不可知的拟合Q评估（QUE）对分层子集进行评估，并审计年龄，性别和种族的子组性能。HACO实现了较强的风险区分（AUC ~0.81），校准阈值（{\tau} ~0.038，{\alpha} = 0.10），同时保持高安全覆盖率。亚组分析揭示了不同人口统计数据之间估计值的系统差异，强调了公平审计的重要性。我们的研究结果表明，共形风险门控与离线RL干净地集成，为人口健康管理团队提供保守的，可审计的决策支持。
摘要：Population health management programs for Medicaid populations coordinate longitudinal outreach and services (e.g., benefits navigation, behavioral health, social needs support, and clinical scheduling) and must be safe, fair, and auditable. We present a Hybrid Adaptive Conformal Offline Reinforcement Learning (HACO) framework that separates risk calibration from preference optimization to generate conservative action recommendations at scale. In our setting, each step involves choosing among common coordination actions (e.g., which member to contact, by which modality, and whether to route to a specialized service) while controlling the near-term risk of adverse utilization events (e.g., unplanned emergency department visits or hospitalizations). Using a de-identified operational dataset from Waymark comprising 2.77 million sequential decisions across 168,126 patients, HACO (i) trains a lightweight risk model for adverse events, (ii) derives a conformal threshold to mask unsafe actions at a target risk level, and (iii) learns a preference policy on the resulting safe subset. We evaluate policies with a version-agnostic fitted Q evaluation (FQE) on stratified subsets and audit subgroup performance across age, sex, and race. HACO achieves strong risk discrimination (AUC ~0.81) with a calibrated threshold ( {\tau} ~0.038 at {\alpha} = 0.10), while maintaining high safe coverage. Subgroup analyses reveal systematic differences in estimated value across demographics, underscoring the importance of fairness auditing. Our results show that conformal risk gating integrates cleanly with offline RL to deliver conservative, auditable decision support for population health management teams.

【5】FetalSleepNet: A Transfer Learning Framework with Spectral Equalisation Domain Adaptation for Fetal Sleep Stage Classification
标题：FetalSleepNet：一种用于胎儿睡眠阶段分类的频谱分解域自适应迁移学习框架
链接：https://arxiv.org/abs/2509.10082

作者：ng, Johann Vargas-Calixto, Nasim Katebi, Nhi Tran, Sharmony B. Kelly, Gari D. Clifford, Robert Galinsky, Faezeh Marzbanrad
备注：13 pages, 4 tables, 5 figures, submitted to IEEE Journal of Biomedical and Health Informatics
摘要：简介：这项研究提出了FetalSleepNet，这是第一个发布的深度学习方法，用于对绵羊脑电图（EEG）的睡眠状态进行分类。胎儿脑电图采集复杂，解释一致性困难且费力。然而，准确的睡眠阶段分类可能有助于早期检测与妊娠并发症（例如缺氧或宫内生长受限）相关的异常脑成熟。方法：将脑电图电极固定于24只妊娠晚期胎羊的脑顶皮质上方的硬脑膜上。最初为成人EEG睡眠分期开发的轻量级深度神经网络使用来自成人EEG的迁移学习在绵羊EEG上进行训练。采用基于谱均衡的域自适应策略来减小跨域失配。结果如下：我们证明，虽然直接传输表现不佳，但完全微调结合光谱均衡实现了最佳的整体性能（准确度：86.6%，宏观F1分数：62.5），优于基线模型。结论：据我们所知，FetalSleepNet是第一个专门为胎儿EEG自动睡眠分期开发的深度学习框架。在实验室之外，基于EEG的睡眠阶段分类器用作标记引擎，实现大规模弱/半监督标记和蒸馏，以促进对可以在临床中获取的侵入性较小的信号（例如多普勒超声或心电图数据）的训练。FetalSleepNet的轻量级设计使其非常适合部署在低功耗、实时和可穿戴的胎儿监护系统中。
摘要：Introduction: This study presents FetalSleepNet, the first published deep learning approach to classifying sleep states from the ovine electroencephalogram (EEG). Fetal EEG is complex to acquire and difficult and laborious to interpret consistently. However, accurate sleep stage classification may aid in the early detection of abnormal brain maturation associated with pregnancy complications (e.g. hypoxia or intrauterine growth restriction). Methods: EEG electrodes were secured onto the ovine dura over the parietal cortices of 24 late gestation fetal sheep. A lightweight deep neural network originally developed for adult EEG sleep staging was trained on the ovine EEG using transfer learning from adult EEG. A spectral equalisation-based domain adaptation strategy was used to reduce cross-domain mismatch. Results: We demonstrated that while direct transfer performed poorly, full fine tuning combined with spectral equalisation achieved the best overall performance (accuracy: 86.6 percent, macro F1-score: 62.5), outperforming baseline models. Conclusions: To the best of our knowledge, FetalSleepNet is the first deep learning framework specifically developed for automated sleep staging from the fetal EEG. Beyond the laboratory, the EEG-based sleep stage classifier functions as a label engine, enabling large scale weak/semi supervised labeling and distillation to facilitate training on less invasive signals that can be acquired in the clinic, such as Doppler Ultrasound or electrocardiogram data. FetalSleepNet's lightweight design makes it well suited for deployment in low power, real time, and wearable fetal monitoring systems.

【6】Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
标题：Sparse Polyak：用于多维M估计的自适应步进规则
链接：https://arxiv.org/abs/2509.09802

作者：ao, Marie Maros
摘要：我们提出并研究稀疏Polyak，Polyak的自适应步长的一个变体，旨在解决高维统计估计问题，其中允许问题维度的增长速度远远超过样本大小。在此类设置中，标准Polyak步长的性能较差，需要增加迭代次数才能实现最佳统计精度-即使问题保持良好状态和/或可实现的精度本身不会随着问题大小而降低。我们跟踪这种限制的不匹配如何平滑测量：在高维，它不再是有效的估计Lipschitz平滑常数。相反，它是更合适的估计限制到特定方向的相关问题（限制Lipschitz平滑常数）的平滑度。稀疏Polyak通过修改步长来估计受限Lipschitz平滑常数来克服这个问题。我们支持我们的方法与理论分析和数值实验，证明其改进的性能。
摘要：We propose and study Sparse Polyak, a variant of Polyak's adaptive step size, designed to solve high-dimensional statistical estimation problems where the problem dimension is allowed to grow much faster than the sample size. In such settings, the standard Polyak step size performs poorly, requiring an increasing number of iterations to achieve optimal statistical precision-even when, the problem remains well conditioned and/or the achievable precision itself does not degrade with problem size. We trace this limitation to a mismatch in how smoothness is measured: in high dimensions, it is no longer effective to estimate the Lipschitz smoothness constant. Instead, it is more appropriate to estimate the smoothness restricted to specific directions relevant to the problem (restricted Lipschitz smoothness constant). Sparse Polyak overcomes this issue by modifying the step size to estimate the restricted Lipschitz smoothness constant. We support our approach with both theoretical analysis and numerical experiments, demonstrating its improved performance.

强化学习(7篇)

【1】Mutual Information Tracks Policy Coherence in Reinforcement Learning
标题：互信息跟踪强化学习中的政策一致性
链接：https://arxiv.org/abs/2509.10423

作者：eid, Wael Hafez, Amirhossein Nazeri
备注：10 pages, 4 figures, 1 table
摘要：部署在现实环境中的强化学习（RL）代理面临着传感器故障，执行器磨损和环境变化的退化，但缺乏检测和诊断这些故障的内在机制。我们提出了一个信息理论框架，揭示了RL的基本动态，并提供了实用的方法来诊断部署时的异常。通过分析机器人控制任务中的状态-动作互信息模式，我们首先证明了成功的学习具有特征信息签名：状态和动作之间的互信息从0.84位稳步增加到2.83位（增长238%），尽管状态熵不断增长，这表明代理人对任务相关模式的选择性关注越来越多。有趣的是，状态、动作和下一个状态的联合互信息MI（S，A; S '）遵循倒U型曲线，在早期学习期间达到峰值，然后随着智能体的专业化而下降，这表明从广泛的探索过渡到有效的利用。更直接的行动，我们表明，信息度量可以区分诊断系统故障：观测空间，即，状态噪声（传感器故障）在所有信息通道上产生广泛的崩溃，状态-动作耦合显著下降，而动作-空间噪声（致动器故障）选择性地破坏动作-结果的可预测性，同时保持状态-动作关系。这种差分诊断能力，通过控制扰动实验证明，使精确的故障定位，而无需架构修改或性能下降。通过建立信息模式作为系统健康的学习和诊断特征，我们为能够基于信息论原理进行自主故障检测和策略调整的自适应强化学习系统提供了基础。
摘要：Reinforcement Learning (RL) agents deployed in real-world environments face degradation from sensor faults, actuator wear, and environmental shifts, yet lack intrinsic mechanisms to detect and diagnose these failures. We present an information-theoretic framework that reveals both the fundamental dynamics of RL and provides practical methods for diagnosing deployment-time anomalies. Through analysis of state-action mutual information patterns in a robotic control task, we first demonstrate that successful learning exhibits characteristic information signatures: mutual information between states and actions steadily increases from 0.84 to 2.83 bits (238% growth) despite growing state entropy, indicating that agents develop increasingly selective attention to task-relevant patterns. Intriguingly, states, actions and next states joint mutual information, MI(S,A;S'), follows an inverted U-curve, peaking during early learning before declining as the agent specializes suggesting a transition from broad exploration to efficient exploitation. More immediately actionable, we show that information metrics can differentially diagnose system failures: observation-space, i.e., states noise (sensor faults) produces broad collapses across all information channels with pronounced drops in state-action coupling, while action-space noise (actuator faults) selectively disrupts action-outcome predictability while preserving state-action relationships. This differential diagnostic capability demonstrated through controlled perturbation experiments enables precise fault localization without architectural modifications or performance degradation. By establishing information patterns as both signatures of learning and diagnostic for system health, we provide the foundation for adaptive RL systems capable of autonomous fault detection and policy adjustment based on information-theoretic principles.

【2】Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Data
标题：超越次优的概括：离线强化学习通过随机数据学习有效的调度
链接：https://arxiv.org/abs/2509.10303

作者： Remmerden, Zaharah Bukhsh, Yingqian Zhang
摘要：作业车间调度问题（JSP）和柔性作业车间调度问题（FJSP）是典型的组合优化问题，在工业生产中有着广泛的应用。近年来，许多在线强化学习（RL）方法被提出来学习JSP和FJSP的构造性算法。虽然有效，但这些在线强化学习方法需要与模拟环境进行数百万次交互，这些模拟环境可能无法捕捉真实世界的复杂性，并且它们的随机策略初始化导致样本效率低下。为了解决这些限制，我们引入了保守离散分位数演员-评论家（CDQAC），这是一种新型的离线RL算法，可以直接从历史数据中学习有效的调度策略，消除了昂贵的在线交互的需要，同时保持了改进次优训练数据的能力。CDQAC将基于分位数的批评与延迟的策略更新相结合，估计每个机器操作对的返回分布，而不是直接选择对。我们广泛的实验证明了CDQAC从不同数据源中学习的卓越能力。CDQAC始终优于原始数据生成算法，并超越最先进的离线和在线RL基线。此外，CDQAC具有很高的样本效率，只需要10-20个训练实例就可以学习到高质量的策略。令人惊讶的是，我们发现，CDQAC执行更好的随机启发式生成的数据训练时，比训练时，从遗传算法和优先级调度规则的高质量的数据。
摘要：The Job-Shop Scheduling Problem (JSP) and Flexible Job-Shop Scheduling Problem (FJSP), are canonical combinatorial optimization problems with wide-ranging applications in industrial operations. In recent years, many online reinforcement learning (RL) approaches have been proposed to learn constructive heuristics for JSP and FJSP. Although effective, these online RL methods require millions of interactions with simulated environments that may not capture real-world complexities, and their random policy initialization leads to poor sample efficiency. To address these limitations, we introduce Conservative Discrete Quantile Actor-Critic (CDQAC), a novel offline RL algorithm that learns effective scheduling policies directly from historical data, eliminating the need for costly online interactions, while maintaining the ability to improve upon suboptimal training data. CDQAC couples a quantile-based critic with a delayed policy update, estimating the return distribution of each machine-operation pair rather than selecting pairs outright. Our extensive experiments demonstrate CDQAC's remarkable ability to learn from diverse data sources. CDQAC consistently outperforms the original data-generating heuristics and surpasses state-of-the-art offline and online RL baselines. In addition, CDQAC is highly sample efficient, requiring only 10-20 training instances to learn high-quality policies. Surprisingly, we find that CDQAC performs better when trained on data generated by a random heuristic than when trained on higher-quality data from genetic algorithms and priority dispatching rules.

【3】Federated Multi-Agent Reinforcement Learning for Privacy-Preserving and Energy-Aware Resource Management in 6G Edge Networks
标题：用于6G边缘网络中隐私保护和能量感知资源管理的联邦多代理强化学习
链接：https://arxiv.org/abs/2509.10163

作者： Javier Esono Nkulu Andong, Qi Min
摘要：随着第六代（6 G）网络向超密集、智能边缘环境发展，在严格的隐私、移动性和能源限制下进行高效的资源管理变得至关重要。本文介绍了一种新的联邦多代理强化学习（Fed-MARL）框架，该框架结合了MAC层和应用层的跨层编排，以实现跨异构边缘设备的节能，隐私保护和实时资源管理。每个代理使用深度递归Q网络（DRQN）来学习分散的策略，用于基于本地观察（例如，队列长度、能量、CPU使用率和移动性）。为了保护隐私，我们引入了一个基于椭圆曲线Diffie Hellman密钥交换的安全聚合协议，该协议可以确保准确的模型更新，而不会将原始数据暴露给半诚实的对手。我们制定的资源管理问题作为一个部分可观察的多智能体马尔可夫决策过程（POMMDP）与多目标奖励函数，共同优化延迟，能源效率，频谱效率，公平性和可靠性下的6 G特定的服务要求，如URLLC，eMBB和mMTC。仿真结果表明，Fed-MARL在任务成功率、延迟、能效和公平性方面优于集中式MARL和启发式基线，同时确保在动态、资源受限的6 G边缘网络中实现强大的隐私保护和可扩展性。
摘要：As sixth-generation (6G) networks move toward ultra-dense, intelligent edge environments, efficient resource management under stringent privacy, mobility, and energy constraints becomes critical. This paper introduces a novel Federated Multi-Agent Reinforcement Learning (Fed-MARL) framework that incorporates cross-layer orchestration of both the MAC layer and application layer for energy-efficient, privacy-preserving, and real-time resource management across heterogeneous edge devices. Each agent uses a Deep Recurrent Q-Network (DRQN) to learn decentralized policies for task offloading, spectrum access, and CPU energy adaptation based on local observations (e.g., queue length, energy, CPU usage, and mobility). To protect privacy, we introduce a secure aggregation protocol based on elliptic curve Diffie Hellman key exchange, which ensures accurate model updates without exposing raw data to semi-honest adversaries. We formulate the resource management problem as a partially observable multi-agent Markov decision process (POMMDP) with a multi-objective reward function that jointly optimizes latency, energy efficiency, spectral efficiency, fairness, and reliability under 6G-specific service requirements such as URLLC, eMBB, and mMTC. Simulation results demonstrate that Fed-MARL outperforms centralized MARL and heuristic baselines in task success rate, latency, energy efficiency, and fairness, while ensuring robust privacy protection and scalability in dynamic, resource-constrained 6G edge networks.

【4】Off Policy Lyapunov Stability in Reinforcement Learning
标题：强化学习中的非策略李雅普诺夫稳定性
链接：https://arxiv.org/abs/2509.09863

作者：ll, Daniela Constantinescu
备注：Conference on Robot Learning (CORL) 2025
摘要：传统的强化学习缺乏提供稳定性保证的能力。最近的算法学习李雅普诺夫函数的控制策略，以确保稳定的学习。然而，目前的自学习李雅普诺夫函数是样本效率低，由于其对政策的性质。本文介绍了一种方法，学习李雅普诺夫函数的政策和建议的政策外李雅普诺夫函数到软演员的批评和邻近的政策优化算法，为他们提供一个数据有效的稳定性证书。倒立摆和四旋翼飞行器的仿真结果表明，这两种算法的性能改善时，赋予所提出的偏离政策的李雅普诺夫函数。
摘要：Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.

【5】Revisiting Actor-Critic Methods in Discrete Action Off-Policy Reinforcement Learning
标题：重新审视离散行动非政策强化学习中的行为批评方法
链接：https://arxiv.org/abs/2509.09838

作者：, Reza Babanezhad, Sharan Vaswani
摘要：DQN等基于值的方法是Atari等离散动作环境下的非策略强化学习的默认方法。常见的基于策略的方法要么是基于策略的，不能有效地从非策略数据中学习（例如PPO），要么在离散动作设置中具有较差的经验性能（例如SAC）。因此，从离散SAC（DSAC）开始，我们重新审视了在这种情况下的演员-评论家方法的设计。首先，我们确定演员和评论家熵之间的耦合是DSAC性能不佳的主要原因。我们证明，仅仅通过解耦这些组件，DSAC可以具有与DQN相当的性能。出于这种洞察力，我们引入了一个灵活的政策外演员批评框架，将DSAC作为一个特例。我们的框架允许使用m步Bellman算子进行评论家更新，并将标准策略优化方法与熵正则化相结合，以实例化所得到的演员目标。理论上，我们证明了所提出的方法可以保证收敛到最佳的正则值函数在表格设置。从经验上讲，我们证明了这些方法可以接近标准Atari游戏的DQN性能，即使没有熵正则化或显式探索。
摘要：Value-based approaches such as DQN are the default methods for off-policy reinforcement learning with discrete-action environments such as Atari. Common policy-based methods are either on-policy and do not effectively learn from off-policy data (e.g. PPO), or have poor empirical performance in the discrete-action setting (e.g. SAC). Consequently, starting from discrete SAC (DSAC), we revisit the design of actor-critic methods in this setting. First, we determine that the coupling between the actor and critic entropy is the primary reason behind the poor performance of DSAC. We demonstrate that by merely decoupling these components, DSAC can have comparable performance as DQN. Motivated by this insight, we introduce a flexible off-policy actor-critic framework that subsumes DSAC as a special case. Our framework allows using an m-step Bellman operator for the critic update, and enables combining standard policy optimization methods with entropy regularization to instantiate the resulting actor objective. Theoretically, we prove that the proposed methods can guarantee convergence to the optimal regularized value function in the tabular setting. Empirically, we demonstrate that these methods can approach the performance of DQN on standard Atari games, and do so even without entropy regularization or explicit exploration.

【6】Meta-Learning Reinforcement Learning for Crypto-Return Prediction
标题：用于加密回报预测的元学习强化学习
链接：https://arxiv.org/abs/2509.09751

作者：ang, Zhaoyang Guan, Guanyu Liu, Tianze Xia, Xianzhi Li, Shuo Yin, Xinyuan Song, Chuhan Cheng, Tianyu Shi, Alex Lee
摘要：预测加密货币的回报是出了名的困难：价格波动是由链上活动、新闻流和社会情绪的快速变化所驱动的，而标记的训练数据是稀缺和昂贵的。在本文中，我们提出了Meta-RL-Crypto，这是一个统一的基于transformer的架构，它将元学习和强化学习（RL）结合起来，以创建一个完全自我改进的交易代理。从一个普通的谨慎调整的LLM开始，代理在三个角色之间迭代交替-演员，法官和元法官-在一个闭环体系结构中。这个学习过程不需要额外的人类监督。它可以利用多模态市场输入和内部偏好反馈。系统中的代理不断完善交易策略和评估标准。不同市场制度的实验表明，Meta-RL-Crypto在真实市场的技术指标上表现良好，并且优于其他基于LLM的基线。
摘要：Predicting cryptocurrency returns is notoriously difficult: price movements are driven by a fast-shifting blend of on-chain activity, news flow, and social sentiment, while labeled training data are scarce and expensive. In this paper, we present Meta-RL-Crypto, a unified transformer-based architecture that unifies meta-learning and reinforcement learning (RL) to create a fully self-improving trading agent. Starting from a vanilla instruction-tuned LLM, the agent iteratively alternates between three roles-actor, judge, and meta-judge-in a closed-loop architecture. This learning process requires no additional human supervision. It can leverage multimodal market inputs and internal preference feedback. The agent in the system continuously refines both the trading policy and evaluation criteria. Experiments across diverse market regimes demonstrate that Meta-RL-Crypto shows good performance on the technical indicators of the real market and outperforming other LLM-based baselines.

【7】Reinforcement learning for spin torque oscillator tasks
标题：旋转扭矩振荡器任务的强化学习
链接：https://arxiv.org/abs/2509.10057

作者：siejuk, Sławomir Ziętek, Witold Skowroński
备注：3 figures, 6 pages
摘要：我们通过强化学习（RL）解决自旋电子振荡器（STO）的自动同步问题。宏自旋Landau-Lifschitz-Gilbert-Slonczewski方程的数值解被用来模拟STO和我们训练的两种类型的RL代理同步与目标频率在一个固定的步骤数。我们探讨修改这个基本任务，并显示在收敛和能源效率的同步，可以很容易地实现在模拟环境中的改进。
摘要：We address the problem of automatic synchronisation of the spintronic oscillator (STO) by means of reinforcement learning (RL). A numerical solution of the macrospin Landau-Lifschitz-Gilbert-Slonczewski equation is used to simulate the STO and we train the two types of RL agents to synchronise with a target frequency within a fixed number of steps. We explore modifications to this base task and show an improvement in both convergence and energy efficiency of the synchronisation that can be easily achieved in the simulated environment.

元学习(1篇)

【1】DB3 Team's Solution For Meta KDD Cup' 25
标题：DB 3 Team针对Meta KDD Cup ' 25的解决方案
链接：https://arxiv.org/abs/2509.09681

作者：a, Jiazun Chen, Yirui Zhan, Suifeng Zhao, Weipeng Jiang, Chaorui Zhang, Wei Han, Bo Bai, Jun Gao
摘要：本文介绍了db 3团队在2025年KDD Cup的Meta CRAG-MM Challenge 2025中的获胜方案。为了应对挑战的独特多模态，多回合问答基准（CRAG-MM），我们开发了一个综合框架，该框架将针对不同任务的定制检索管道与用于幻觉控制的统一LLM调优方法集成在一起。我们的解决方案具有以下特点：（1）特定于领域的检索管道，处理图像索引的知识图、Web资源和多轮对话;（2）使用SFT、DPO和RL进行高级拒绝训练。该系统在任务1中获得第二名，在任务2中获得第二名，在任务3中获得第一名，通过出色地处理第一人称视角挑战，获得了以自我为中心的查询的优秀大奖。
摘要：This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark (CRAG-MM), we developed a comprehensive framework that integrates tailored retrieval pipelines for different tasks with a unified LLM-tuning approach for hallucination control. Our solution features (1) domain-specific retrieval pipelines handling image-indexed knowledge graphs, web sources, and multi-turn conversations; and (2) advanced refusal training using SFT, DPO, and RL. The system achieved 2nd place in Task 1, 2nd place in Task 2, and 1st place in Task 3, securing the grand prize for excellence in ego-centric queries through superior handling of first-person perspective challenges.

符号|符号学习(2篇)

【1】KAN-SR: A Kolmogorov-Arnold Network Guided Symbolic Regression Framework
标题：KAN-SR：Kolmogorov-Arnold网络引导的符号回归框架
链接：https://arxiv.org/abs/2509.10089

作者：rea Bühler, Gonzalo Guillén-Gosálbez
摘要：我们介绍了一种新的符号回归框架，即KAN-SR，建立在Kolmogorov Arnold网络（KANs），遵循分治的方法。符号回归搜索最适合给定数据集的数学方程，通常用遗传编程方法求解。我们表明，通过使用深度学习技术，更具体的KAN，并将其与平移对称性和可分性等简化策略相结合，我们能够恢复费曼科学发现符号回归（SRSD）数据集的地面实况方程。此外，我们表明，通过将所提出的框架与神经控制微分方程相结合，我们能够精确地模拟计算机生物过程系统的动态，为其他工程系统的动态建模打开了大门。
摘要：We introduce a novel symbolic regression framework, namely KAN-SR, built on Kolmogorov Arnold Networks (KANs) which follows a divide-and-conquer approach. Symbolic regression searches for mathematical equations that best fit a given dataset and is commonly solved with genetic programming approaches. We show that by using deep learning techniques, more specific KANs, and combining them with simplification strategies such as translational symmetries and separabilities, we are able to recover ground-truth equations of the Feynman Symbolic Regression for Scientific Discovery (SRSD) dataset. Additionally, we show that by combining the proposed framework with neural controlled differential equations, we are able to model the dynamics of an in-silico bioprocess system precisely, opening the door for the dynamic modeling of other engineering systems.

【2】Symbolic Feedforward Networks for Probabilistic Finite Automata: Exact Simulation and Learnability
标题：概率有限自动机的符号前向网络：精确模拟和可学习性
链接：https://arxiv.org/abs/2509.10034

作者：esh Dhayalkar
备注：19 pages, 2 figures
摘要：我们提出了一个正式的和建设性的理论表明，概率有限自动机（PFAs）可以准确地模拟使用符号前馈神经网络。我们的架构表示状态分布为向量和随机矩阵的过渡，使概率状态传播通过矩阵向量产品。这产生了一个并行的，可解释的，可微的PFA动态模拟使用软更新，而无需复发。我们正式表征概率子集的建设，$\vareprogram $-封闭，并通过分层符号计算精确模拟，并证明PFAs和特定类别的神经网络之间的等价性。我们进一步证明，这些符号模拟器不仅具有表达能力，而且是可学习的：在标记序列数据上使用标准的基于梯度下降的优化进行训练，它们可以恢复地面实况PFA的确切行为。在命题5.1中形式化的这种可学习性是这项工作的关键。我们的研究结果将概率自动机理论与神经架构统一在严格的代数框架下，弥合了符号计算和深度学习之间的差距。
摘要：We present a formal and constructive theory showing that probabilistic finite automata (PFAs) can be exactly simulated using symbolic feedforward neural networks. Our architecture represents state distributions as vectors and transitions as stochastic matrices, enabling probabilistic state propagation via matrix-vector products. This yields a parallel, interpretable, and differentiable simulation of PFA dynamics using soft updates-without recurrence. We formally characterize probabilistic subset construction, $\varepsilon$-closure, and exact simulation via layered symbolic computation, and prove equivalence between PFAs and specific classes of neural networks. We further show that these symbolic simulators are not only expressive but learnable: trained with standard gradient descent-based optimization on labeled sequence data, they recover the exact behavior of ground-truth PFAs. This learnability, formalized in Proposition 5.1, is the crux of this work. Our results unify probabilistic automata theory with neural architectures under a rigorous algebraic framework, bridging the gap between symbolic computation and deep learning.

医学相关(3篇)

【1】BenchECG and xECG: a benchmark and baseline for ECG foundation models
标题：Bench心电图和x心电图：心电图基础模型的基准和基线
链接：https://arxiv.org/abs/2509.10151

作者：Lunelli, Angus Nicolson, Samuel Martin Pröll, Sebastian Johannes Reinstadler, Axel Bauer, Clemens Dlaska
备注：32 pages, 4 figures, 22 tables
摘要：心电图（ECG）价格便宜，使用广泛，非常适合深度学习。最近，人们对开发ECG基础模型的兴趣越来越大-这些模型可以概括各种下游任务。然而，缺乏一致的评估：以前的工作经常使用狭窄的任务选择和不一致的数据集，阻碍了公平的比较。在这里，我们介绍BenchECG，这是一个标准化的基准测试，包括一套全面的公开可用的ECG数据集和多功能任务。我们还提出了xECG，这是一种使用SimDINOv 2自监督学习训练的基于xLSTM的循环模型，与公开的最先进模型相比，它实现了最佳的BenchECG评分。特别是，xECG是唯一一个在所有数据集和任务上都表现出色的公开可用模型。通过标准化评估，BenchECG可以进行严格的比较，旨在加速ECG表示学习的进展。xECG实现了优于早期方法的性能，为未来的ECG基础模型定义了新的基线。
摘要：Electrocardiograms (ECGs) are inexpensive, widely used, and well-suited to deep learning. Recently, interest has grown in developing foundation models for ECGs - models that generalise across diverse downstream tasks. However, consistent evaluation has been lacking: prior work often uses narrow task selections and inconsistent datasets, hindering fair comparison. Here, we introduce BenchECG, a standardised benchmark comprising a comprehensive suite of publicly available ECG datasets and versatile tasks. We also propose xECG, an xLSTM-based recurrent model trained with SimDINOv2 self-supervised learning, which achieves the best BenchECG score compared to publicly available state-of-the-art models. In particular, xECG is the only publicly available model to perform strongly on all datasets and tasks. By standardising evaluation, BenchECG enables rigorous comparison and aims to accelerate progress in ECG representation learning. xECG achieves superior performance over earlier approaches, defining a new baseline for future ECG foundation models.

【2】Multi-pathology Chest X-ray Classification with Rejection Mechanisms
标题：具有排斥机制的多病理胸部X射线分类
链接：https://arxiv.org/abs/2509.10348

作者：perstein, Amit Tzahar, Alon Gottlib, Tal Verber, Ravit Shagan Damti, Alexander Apartsin
备注：12 pages, 4 figures
摘要：对深度学习模型的过度自信在高风险医学成像任务中构成了重大风险，特别是在胸部X光片的多标签分类中，必须同时检测多个共同发生的病理。本研究介绍了一种基于DenseNet-121主干的胸部X射线诊断的不确定性感知框架，该框架通过两种选择性预测机制进行增强：基于熵的拒绝和基于置信区间的拒绝。这两种方法都使模型能够避免不确定的预测，通过将模棱两可的病例推迟给临床专家来提高可靠性。一个基于分位数的校准过程是使用全局或类特定的策略来调整拒绝阈值。在三个大型公共数据集（PadChest，NIH ChestX-ray 14和MIMIC-CXR）上进行的实验表明，选择性拒绝改善了诊断准确性和覆盖率之间的权衡，基于熵的拒绝在所有病理中产生最高的平均AUC。这些结果支持将选择性预测集成到人工智能辅助诊断工作流程中，为在临床环境中更安全，不确定性感知的深度学习部署提供了实际步骤。
摘要：Overconfidence in deep learning models poses a significant risk in high-stakes medical imaging tasks, particularly in multi-label classification of chest X-rays, where multiple co-occurring pathologies must be detected simultaneously. This study introduces an uncertainty-aware framework for chest X-ray diagnosis based on a DenseNet-121 backbone, enhanced with two selective prediction mechanisms: entropy-based rejection and confidence interval-based rejection. Both methods enable the model to abstain from uncertain predictions, improving reliability by deferring ambiguous cases to clinical experts. A quantile-based calibration procedure is employed to tune rejection thresholds using either global or class-specific strategies. Experiments conducted on three large public datasets (PadChest, NIH ChestX-ray14, and MIMIC-CXR) demonstrate that selective rejection improves the trade-off between diagnostic accuracy and coverage, with entropy-based rejection yielding the highest average AUC across all pathologies. These results support the integration of selective prediction into AI-assisted diagnostic workflows, providing a practical step toward safer, uncertainty-aware deployment of deep learning in clinical settings.

【3】Engineering Spatial and Molecular Features from Cellular Niches to Inform Predictions of Inflammatory Bowel Disease
标题：工程化细胞利基的空间和分子特征，为炎症性肠道疾病的预测提供信息
链接：https://arxiv.org/abs/2509.09923

作者：hua Toledo Tan, Maria Kapetanaki, Panayiotis V. Benos
备注：18 pages, 7 figures, 7 tables. Submitted to the 25th BNAIC Conference, Namur, Belgium, November 19 - 21, 2025
摘要：炎症性肠病（IBD）的两种主要亚型之间的区分：克罗恩病（CD）和溃疡性结肠炎（UC）是一个持续的临床挑战，由于重叠的表现。这项研究介绍了一种新的计算框架，采用空间转录组学（ST）创建一个可解释的IBD分类机器学习模型。我们分析了来自健康对照（HC）、UC和CD患者的结肠粘膜的ST数据。使用非负矩阵因子分解（NMF），我们首先确定了四个重复出现的细胞小生境，代表了组织内不同的功能微环境。从这些小生境中，我们系统地设计了44个特征，捕获组织病理学的三个关键方面：小生境组成、邻近富集和小生境基因信号。在这些特征上训练的多层感知器（MLP）分类器在更具挑战性的三类问题（HC，UC和CD）中达到了0.774 +/- 0.161的准确度，在区分IBD和健康组织的两类问题中达到了0.916 +/- 0.118的准确度。至关重要的是，模型可解释性分析显示，小生境空间组织的破坏是一般炎症的最强预测因子，而UC和CD之间的分类依赖于特定的小生境基因表达特征。这项工作提供了一个强大的概念验证管道，将描述性空间数据转换为准确和可解释的预测工具，不仅提供了一个潜在的新诊断范式，还提供了对驱动IBD亚型的独特生物学机制的更深入的见解。
摘要：Differentiating between the two main subtypes of Inflammatory Bowel Disease (IBD): Crohns disease (CD) and ulcerative colitis (UC) is a persistent clinical challenge due to overlapping presentations. This study introduces a novel computational framework that employs spatial transcriptomics (ST) to create an explainable machine learning model for IBD classification. We analyzed ST data from the colonic mucosa of healthy controls (HC), UC, and CD patients. Using Non-negative Matrix Factorization (NMF), we first identified four recurring cellular niches, representing distinct functional microenvironments within the tissue. From these niches, we systematically engineered 44 features capturing three key aspects of tissue pathology: niche composition, neighborhood enrichment, and niche-gene signals. A multilayer perceptron (MLP) classifier trained on these features achieved an accuracy of 0.774 +/- 0.161 for the more challenging three-class problem (HC, UC, and CD) and 0.916 +/- 0.118 in the two-class problem of distinguishing IBD from healthy tissue. Crucially, model explainability analysis revealed that disruptions in the spatial organization of niches were the strongest predictors of general inflammation, while the classification between UC and CD relied on specific niche-gene expression signatures. This work provides a robust, proof-of-concept pipeline that transforms descriptive spatial data into an accurate and explainable predictive tool, offering not only a potential new diagnostic paradigm but also deeper insights into the distinct biological mechanisms that drive IBD subtypes.

蒸馏|知识提取(3篇)

【1】Improving MLLM Historical Record Extraction with Test-Time Image
标题：利用测试时图像改进MLLM历史记录提取
链接：https://arxiv.org/abs/2509.09722

作者：chibald, Tony Martinez
摘要：我们提出了一种新的集成框架，稳定LLM为基础的文本提取嘈杂的历史文件。我们用Gemini 2.0 Flash转录每个图像的多个增强变体，并将这些输出与自定义Needleman Wunsch风格的对齐器融合，该对齐器产生共识转录和置信度得分。我们提出了一个新的数据集，622宾夕法尼亚州的死亡记录，并证明我们的方法提高了转录准确性4个百分点，相对于Single Shot基线。我们发现填充和模糊是最有用的提高精度，而网格扭曲扰动是最好的分离高和低置信度的情况下。该方法简单，可扩展，并可立即部署到其他文档集合和转录模型。
摘要：We present a novel ensemble framework that stabilizes LLM based text extraction from noisy historical documents. We transcribe multiple augmented variants of each image with Gemini 2.0 Flash and fuse these outputs with a custom Needleman Wunsch style aligner that yields both a consensus transcription and a confidence score. We present a new dataset of 622 Pennsylvania death records, and demonstrate our method improves transcription accuracy by 4 percentage points relative to a single shot baseline. We find that padding and blurring are the most useful for improving accuracy, while grid warp perturbations are best for separating high and low confidence cases. The approach is simple, scalable, and immediately deployable to other document collections and transcription models.

【2】AEGIS: An Agent for Extraction and Geographic Identification in Scholarly Proceedings
标题：AEGIS：学术论文集中的提取和地理识别代理
链接：https://arxiv.org/abs/2509.09470

作者：h, Harshad Khadilkar, Deepak Akkil
备注：5 pages, 2 figures
摘要：跟上学术文献的快速增长，对研究人员、资助机构和学术团体提出了重大挑战。为了解决学术发现所需的耗时的手动工作，我们提出了一种新颖的，完全自动化的系统，从数据发现过渡到直接行动。我们的流水线演示了一个专门的人工智能代理“Agent-E”如何在会议程序中识别来自特定地理区域的论文，然后执行机器人流程自动化（RPA）来完成预定义的操作，例如提交提名表格。我们在来自五个不同会议的586篇论文上验证了我们的系统，它成功地识别了每一篇目标论文，召回率为100%，准确率接近99.4%。该演示突出了面向任务的人工智能代理的潜力，不仅可以过滤信息，还可以积极参与和加速学术界的工作流程。
摘要：Keeping pace with the rapid growth of academia literature presents a significant challenge for researchers, funding bodies, and academic societies. To address the time-consuming manual effort required for scholarly discovery, we present a novel, fully automated system that transitions from data discovery to direct action. Our pipeline demonstrates how a specialized AI agent, 'Agent-E', can be tasked with identifying papers from specific geographic regions within conference proceedings and then executing a Robotic Process Automation (RPA) to complete a predefined action, such as submitting a nomination form. We validated our system on 586 papers from five different conferences, where it successfully identified every target paper with a recall of 100% and a near perfect accuracy of 99.4%. This demonstration highlights the potential of task-oriented AI agents to not only filter information but also to actively participate in and accelerate the workflows of the academic community.

【3】Unified Learnable 2D Convolutional Feature Extraction for ASR
标题：用于ASB的统一可学习2D卷积特征提取
链接：https://arxiv.org/abs/2509.10031

作者：ting, Benedikt Hilmes, Ralf Schlüter, Hermann Ney
备注：Accepted at ITG Conference on Speech Communication 2025
摘要：神经前端代表了自动语音识别（ASR）系统特征提取的一种有前途的方法，因为它们能够为不同的任务学习专门定制的特征。然而，许多现有的技术仍然受到经典方法的严重影响。虽然这种归纳偏差可能会简化系统设计，但我们的工作旨在开发一个更通用的特征提取前端。此外，我们寻求统一前端架构，与应用源自不同来源的多层拓扑组合的现有方法形成鲜明对比。实验系统地展示了如何减少现有技术的影响，以实现通用的前端。由此产生的2D卷积前端是参数高效的，适用于计算资源有限的场景，而不像在未标记音频上预先训练的大型模型。结果表明，这种通用的统一方法不仅是可行的，而且与现有的监督可学习特征提取器的性能相匹配。
摘要：Neural front-ends represent a promising approach to feature extraction for automatic speech recognition (ASR) systems as they enable to learn specifically tailored features for different tasks. Yet, many of the existing techniques remain heavily influenced by classical methods. While this inductive bias may ease the system design, our work aims to develop a more generic front-end for feature extraction. Furthermore, we seek to unify the front-end architecture contrasting with existing approaches that apply a composition of several layer topologies originating from different sources. The experiments systematically show how to reduce the influence of existing techniques to achieve a generic front-end. The resulting 2D convolutional front-end is parameter-efficient and suitable for a scenario with limited computational resources unlike large models pre-trained on unlabeled audio. The results demonstrate that this generic unified approach is not only feasible but also matches the performance of existing supervised learnable feature extractors.

推荐(1篇)

【1】Model-agnostic post-hoc explainability for recommender systems
标题：推荐系统的模型不可知事后解释性
链接：https://arxiv.org/abs/2509.10245

作者：valo, Jose L Salmeron
备注：None
摘要：推荐系统通常受益于复杂的特征嵌入和深度学习算法，这些算法可以提供复杂的推荐，从而增强用户体验、参与度和收入。然而，这些方法经常降低系统的可解释性和透明度。在这项研究中，我们开发了一个系统的应用程序，适应和评价删除诊断的推荐设置。该方法将模型的性能与在没有特定用户或项目的情况下训练的类似模型的性能进行比较，从而使我们能够量化该观察结果对推荐器的影响，无论是积极的还是消极的。为了证明其模型不可知的性质，该提案被应用于神经协同过滤（NCF），一种广泛使用的基于深度学习的推荐器，以及奇异值分解（SVD），一种经典的协同过滤技术。MovieLens和Amazon Reviews数据集上的实验提供了对模型行为的深入了解，并突出了该方法在不同推荐范式中的通用性。
摘要：Recommender systems often benefit from complex feature embeddings and deep learning algorithms, which deliver sophisticated recommendations that enhance user experience, engagement, and revenue. However, these methods frequently reduce the interpretability and transparency of the system. In this research, we develop a systematic application, adaptation, and evaluation of deletion diagnostics in the recommender setting. The method compares the performance of a model to that of a similar model trained without a specific user or item, allowing us to quantify how that observation influences the recommender, either positively or negatively. To demonstrate its model-agnostic nature, the proposal is applied to both Neural Collaborative Filtering (NCF), a widely used deep learning-based recommender, and Singular Value Decomposition (SVD), a classical collaborative filtering technique. Experiments on the MovieLens and Amazon Reviews datasets provide insights into model behavior and highlight the generality of the approach across different recommendation paradigms.

聚类(1篇)

【1】HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasets
标题：HypoGeneAgent：一种使用扰动序列数据集进行基因集集群解析选择的假设语言代理
链接：https://arxiv.org/abs/2509.09740

作者：, Xing-Yue Monica Ge, Aaron Archer Waterman, Tommaso Biancalani, David Richmond, Yogesh Pandit, Avtar Singh, Russell Littman, Jin Liu, Jan-Christian Huetter, Vladimir Ermakov
摘要：大规模的单细胞和Perturb-seq研究通常涉及聚类细胞，然后用基因本体论（GO）术语注释每个聚类，以阐明潜在的生物学程序。然而，这两个阶段，解决方案选择和功能注释，本质上是主观的，依赖于语言学和专家策展。我们提出了HYPOGENEAGENT，一个大型语言模型（LLM）驱动的框架，将聚类注释转化为一个定量优化的任务。最初，作为基因集分析师的LLM分析每个基因程序或扰动模块的内容，并生成基于GO的假设的排名列表，以及校准的置信度得分。随后，我们用一个嵌入模型嵌入每个预测的描述，计算成对的余弦相似度，并让代理裁判小组评分（i）预测的内部一致性，同一聚类内的高平均相似度，称为聚类内一致性（ii）它们的外部独特性，聚类之间的低相似度，称为聚类间分离。这两个数量相结合，以产生代理派生的分辨率分数，这是最大化时，集群表现出同时的一致性和互斥性。当应用于公共K562 CRISPRi Perturb-seq数据集作为初步测试时，我们的分辨率得分选择与经典指标（如剪影得分，基因功能富集总结的模块化得分）相比显示与已知途径对齐的聚类粒度。这些发现将LLM代理建立为聚类分辨率和功能注释的客观裁定者，从而为单细胞多组学研究中的全自动，上下文感知的解释管道铺平了道路。
摘要：Large-scale single-cell and Perturb-seq investigations routinely involve clustering cells and subsequently annotating each cluster with Gene-Ontology (GO) terms to elucidate the underlying biological programs. However, both stages, resolution selection and functional annotation, are inherently subjective, relying on heuristics and expert curation. We present HYPOGENEAGENT, a large language model (LLM)-driven framework, transforming cluster annotation into a quantitatively optimizable task. Initially, an LLM functioning as a gene-set analyst analyzes the content of each gene program or perturbation module and generates a ranked list of GO-based hypotheses, accompanied by calibrated confidence scores. Subsequently, we embed every predicted description with a sentence-embedding model, compute pair-wise cosine similarities, and let the agent referee panel score (i) the internal consistency of the predictions, high average similarity within the same cluster, termed intra-cluster agreement (ii) their external distinctiveness, low similarity between clusters, termed inter-cluster separation. These two quantities are combined to produce an agent-derived resolution score, which is maximized when clusters exhibit simultaneous coherence and mutual exclusivity. When applied to a public K562 CRISPRi Perturb-seq dataset as a preliminary test, our Resolution Score selects clustering granularities that exhibit alignment with known pathway compared to classical metrics such silhouette score, modularity score for gene functional enrichment summary. These findings establish LLM agents as objective adjudicators of cluster resolution and functional annotation, thereby paving the way for fully automated, context-aware interpretation pipelines in single-cell multi-omics studies.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception
标题：DGFusion：深度引导的传感器融合，用于鲁棒的语义感知
链接：https://arxiv.org/abs/2509.09828

作者：ermannn, Christos Sakaridis, Luigi Piccinelli, Wim Abbeloos, Luc Van Gool
备注：Code and models will be available at this https URL
摘要：自动驾驶汽车的强大语义感知依赖于有效地将多个传感器与互补的优势和劣势相结合。用于语义感知的最先进的传感器融合方法通常在输入的空间范围内均匀地处理传感器数据，这在面对具有挑战性的条件时阻碍了性能。相比之下，我们提出了一种新的深度引导的多模态融合方法，通过整合深度信息来升级条件感知融合。我们的网络DGFusion将多模态分割作为一个多任务问题，利用通常在室外传感器套件中可用的激光雷达测量值作为模型的输入之一，并作为学习深度的基础事实。我们相应的辅助深度头有助于学习深度感知特征，这些特征被编码成空间变化的局部深度标记，这些标记调节我们的注意力跨模态融合。与全局条件令牌一起，这些局部深度令牌动态地使传感器融合适应场景中每个传感器的空间变化的可靠性，这在很大程度上取决于深度。此外，我们提出了一个强大的损失，我们的深度，这是必不可少的学习激光雷达输入，通常是稀疏和嘈杂的不利条件。我们的方法在具有挑战性的MUSES和DELIVER数据集上实现了最先进的全景和语义分割性能。代码和模型将在https://github.com/timbroed/DGFusion上提供
摘要：Robust semantic perception for autonomous vehicles relies on effectively combining multiple sensors with complementary strengths and weaknesses. State-of-the-art sensor fusion approaches to semantic perception often treat sensor data uniformly across the spatial extent of the input, which hinders performance when faced with challenging conditions. By contrast, we propose a novel depth-guided multimodal fusion method that upgrades condition-aware fusion by integrating depth information. Our network, DGFusion, poses multimodal segmentation as a multi-task problem, utilizing the lidar measurements, which are typically available in outdoor sensor suites, both as one of the model's inputs and as ground truth for learning depth. Our corresponding auxiliary depth head helps to learn depth-aware features, which are encoded into spatially varying local depth tokens that condition our attentive cross-modal fusion. Together with a global condition token, these local depth tokens dynamically adapt sensor fusion to the spatially varying reliability of each sensor across the scene, which largely depends on depth. In addition, we propose a robust loss for our depth, which is essential for learning from lidar inputs that are typically sparse and noisy in adverse conditions. Our method achieves state-of-the-art panoptic and semantic segmentation performance on the challenging MUSES and DELIVER datasets. Code and models will be available at https://github.com/timbroed/DGFusion

联邦学习|隐私保护|加密(3篇)

【1】FedBiF: Communication-Efficient Federated Learning via Bits Freezing
标题：FedBiF：通过位冻结进行高效沟通的联邦学习
链接：https://arxiv.org/abs/2509.10161

作者：, Qunwei Li, Haozhao Wang, Ruixuan Li, Jianbin Lin, Wenliang Zhong
备注：Accepted by TPDS
摘要：联邦学习（FL）是一种新兴的分布式机器学习范式，它可以在不共享本地数据的情况下实现协作模型训练。尽管FL有其优点，但它存在大量的通信开销，这会影响训练效率。最近的努力通过量化模型更新来减少通信成本，从而缓解了这个问题。然而，大多数现有的方法仅在局部训练后应用量化，将量化误差引入训练参数并潜在地降低模型精度。在本文中，我们提出了联邦比特冻结（FedBiF），这是一种新的FL框架，可以在局部训练过程中直接学习量化模型参数。在每一轮通信中，服务器首先量化模型参数并将其发送给客户端。然后FedBiF允许每个客户端只更新多位参数表示的一个位，冻结其余位。这种逐位更新策略将每个参数更新减少到一位，同时保持参数表示的高精度。在IID和非IID设置下，对五个广泛使用的数据集进行了广泛的实验。结果表明，FedBiF不仅实现了卓越的通信压缩，而且还提高了生成模型的稀疏性。值得注意的是，FedBiF达到了与FedAvg相当的精度，即使在上行链路通信仅使用1比特每参数（bpp）和下行链路通信仅使用3 bpp时也是如此。该代码可在https://github.com/Leopold1423/fedbif-tpds25上获得。
摘要：Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data. Despite its advantages, FL suffers from substantial communication overhead, which can affect training efficiency. Recent efforts have mitigated this issue by quantizing model updates to reduce communication costs. However, most existing methods apply quantization only after local training, introducing quantization errors into the trained parameters and potentially degrading model accuracy. In this paper, we propose Federated Bit Freezing (FedBiF), a novel FL framework that directly learns quantized model parameters during local training. In each communication round, the server first quantizes the model parameters and transmits them to the clients. FedBiF then allows each client to update only a single bit of the multi-bit parameter representation, freezing the remaining bits. This bit-by-bit update strategy reduces each parameter update to one bit while maintaining high precision in parameter representation. Extensive experiments are conducted on five widely used datasets under both IID and Non-IID settings. The results demonstrate that FedBiF not only achieves superior communication compression but also promotes sparsity in the resulting models. Notably, FedBiF attains accuracy comparable to FedAvg, even when using only 1 bit-per-parameter (bpp) for uplink and 3 bpp for downlink communication. The code is available at https://github.com/Leopold1423/fedbif-tpds25.

【2】Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated Learning
标题：通过Bayesian联邦学习中的信息几何投影实现无成本个性化
链接：https://arxiv.org/abs/2509.10132

作者：ussi, Giuseppe Serra, Photios A. Stavrou, Marios Kountouris
摘要：贝叶斯联合学习（Bayesian Federated Learning，BFL）将不确定性建模与分散式训练相结合，能够在数据异构性和隐私约束下开发个性化和可靠的模型。现有的方法通常依赖于马尔可夫链蒙特卡罗（MCMC）采样或变分推理，通常结合个性化机制，以更好地适应本地数据分布。在这项工作中，我们提出了一个信息几何投影框架的个性化参数BFL。通过将全局模型投影到用户局部模型的邻域上，我们的方法实现了全局泛化和局部专业化之间的可调权衡。在温和的假设下，我们表明，这个投影步骤相当于计算统计流形上的重心，使我们能够得到封闭形式的解决方案，实现无成本的个性化。我们将所提出的方法应用于使用改进的变分在线牛顿（IVON）优化器的变分学习设置，并将其应用扩展到BFL中的一般聚合方案。异构数据分布下的实证评估证实，我们的方法有效地平衡了全球和本地的性能与最小的计算开销。
摘要：Bayesian Federated Learning (BFL) combines uncertainty modeling with decentralized training, enabling the development of personalized and reliable models under data heterogeneity and privacy constraints. Existing approaches typically rely on Markov Chain Monte Carlo (MCMC) sampling or variational inference, often incorporating personalization mechanisms to better adapt to local data distributions. In this work, we propose an information-geometric projection framework for personalization in parametric BFL. By projecting the global model onto a neighborhood of the user's local model, our method enables a tunable trade-off between global generalization and local specialization. Under mild assumptions, we show that this projection step is equivalent to computing a barycenter on the statistical manifold, allowing us to derive closed-form solutions and achieve cost-free personalization. We apply the proposed approach to a variational learning setup using the Improved Variational Online Newton (IVON) optimizer and extend its application to general aggregation schemes in BFL. Empirical evaluations under heterogeneous data distributions confirm that our method effectively balances global and local performance with minimal computational overhead.

【3】FedRP: A Communication-Efficient Approach for Differentially Private Federated Learning Using Random Projection
标题：FedRP：一种使用随机投影进行差异化私人联邦学习的高效通信方法
链接：https://arxiv.org/abs/2509.10041

作者：Hasan Narimani, Mostafa Tavassolipour
摘要：联邦学习（FL）为智能手机等分散设备的协作模型训练提供了一种创新范式，在物联网（IoT）和医疗数据分析等敏感领域平衡了增强的预测性能和用户隐私保护。尽管FL具有优势，但它在保护用户隐私免受潜在攻击和管理通信成本方面遇到了重大挑战。本文介绍了一种新的联邦学习算法FedRP，它集成了随机投影技术与交替方向乘法（ADMM）优化框架。这种方法通过采用随机投影来降低模型参数的维数，然后将其传输到中央服务器，从而降低了通信成本，从而增强了隐私。该算法提供了一个强大的$（\delta，\delta）$-差分隐私保证，表现出对数据重构攻击的弹性。实验结果表明，FedRP不仅保持了较高的模型精度，而且优于现有的方法，包括传统的差分隐私方法和FedADMM，在隐私保护和通信效率。
摘要：Federated learning (FL) offers an innovative paradigm for collaborative model training across decentralized devices, such as smartphones, balancing enhanced predictive performance with the protection of user privacy in sensitive areas like Internet of Things (IoT) and medical data analysis. Despite its advantages, FL encounters significant challenges related to user privacy protection against potential attacks and the management of communication costs. This paper introduces a novel federated learning algorithm called FedRP, which integrates random projection techniques with the Alternating Direction Method of Multipliers (ADMM) optimization framework. This approach enhances privacy by employing random projection to reduce the dimensionality of model parameters prior to their transmission to a central server, reducing the communication cost. The proposed algorithm offers a strong $(\epsilon, \delta)$-differential privacy guarantee, demonstrating resilience against data reconstruction attacks. Experimental results reveal that FedRP not only maintains high model accuracy but also outperforms existing methods, including conventional differential privacy approaches and FedADMM, in terms of both privacy preservation and communication efficiency.

推理|分析|理解|解释(3篇)

【1】Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
标题：了解本地新元中的外部优化器：学习率、动量和加速度
链接：https://arxiv.org/abs/2509.10439

作者：led, Satyen Kale, Arthur Douillard, Chi Jin, Rob Fergus, Manzil Zaheer
摘要：现代机器学习通常需要使用大批量、分布式数据和大规模并行计算硬件（如移动和其他边缘设备或分布式数据中心）进行训练。在这种情况下，通信成为一个主要的瓶颈，但像局部随机梯度下降（Local SGD）这样的方法在减少这种额外的通信开销方面表现出很大的希望。本地SGD由三部分组成：本地优化过程、聚合机制和外部优化器，外部优化器使用来自节点的聚合更新来生成新模型。虽然有大量的文献了解超参数在局部优化过程中的影响，外部优化器及其超参数的选择是不太清楚。我们研究了局部SGD中外部优化器的作用，并证明了算法的新的收敛保证。特别是，我们表明，调整外部学习率使我们能够（a）在优化误差和随机梯度噪声方差之间进行权衡，以及（b）弥补内部学习率的失调。我们的理论表明，外部学习率有时应该设置为大于1美元的值。我们将结果扩展到在外部优化器中使用动量的设置，并且我们显示了动量调整的外部学习率的类似作用。我们还研究了外部优化器的加速，并表明它提高了收敛速度作为通信轮数的函数，提高了局部应用加速的先前算法的收敛速度。最后，我们还介绍了一种新的局部SGD的数据依赖性分析，该分析对外部学习率调整产生了进一步的见解。我们用标准语言模型和各种外部优化器进行了全面的实验，以验证我们的理论。
摘要：Modern machine learning often requires training with large batch size, distributed data, and massively parallel compute hardware (like mobile and other edge devices or distributed data centers). Communication becomes a major bottleneck in such settings but methods like Local Stochastic Gradient Descent (Local SGD) show great promise in reducing this additional communication overhead. Local SGD consists of three parts: a local optimization process, an aggregation mechanism, and an outer optimizer that uses the aggregated updates from the nodes to produce a new model. While there exists an extensive literature on understanding the impact of hyperparameters in the local optimization process, the choice of outer optimizer and its hyperparameters is less clear. We study the role of the outer optimizer in Local SGD, and prove new convergence guarantees for the algorithm. In particular, we show that tuning the outer learning rate allows us to (a) trade off between optimization error and stochastic gradient noise variance, and (b) make up for ill-tuning of the inner learning rate. Our theory suggests that the outer learning rate should sometimes be set to values greater than $1$. We extend our results to settings where we use momentum in the outer optimizer, and we show a similar role for the momentum-adjusted outer learning rate. We also study acceleration in the outer optimizer and show that it improves the convergence rate as a function of the number of communication rounds, improving upon the convergence rate of prior algorithms that apply acceleration locally. Finally, we also introduce a novel data-dependent analysis of Local SGD that yields further insights on outer learning rate tuning. We conduct comprehensive experiments with standard language models and various outer optimizers to validate our theory.

【2】D-CAT: Decoupled Cross-Attention Transfer between Sensor Modalities for Unimodal Inference
标题：D-CAT：单模式推理的传感器模式之间的脱钩交叉注意转移
链接：https://arxiv.org/abs/2509.09747

作者：r, Zhaobo Wang, Malcolm Mielle
摘要：跨模态迁移学习用于改进多模态分类模型（例如，用于人机协作中的人类活动识别）。然而，现有方法在训练和推断时都需要配对的传感器数据，从而限制了在资源受限的环境中的部署，在资源受限的环境中，完整的传感器套件在经济上和技术上都不可用。为了解决这个问题，我们提出了解耦交叉注意力转移（D-CAT），这是一个框架，在推理过程中不需要联合传感器模态，就可以对齐特定模态的表示。我们的方法结合了一个自我注意力模块的特征提取与一个新的交叉注意力对齐损失，这强制对齐传感器的特征空间，而不需要耦合的分类管道的两种形式。我们评估D-CAT三个多模态人类活动数据集（IMU，视频和音频）在分布和分布的情况下，与单模态模型比较。结果表明，在分销场景中，从高性能模式（例如，视频到IMU）比单模态训练产生高达10%的F1分数增益。在分布范围外的情况下，即使是较弱的源模态（例如，IMU到视频）提高目标性能，只要目标模型在训练数据上没有过拟合。通过启用具有跨模态知识的单传感器推理，D-CAT减少了感知系统的硬件冗余，同时保持准确性，这对于成本敏感或自适应部署至关重要（例如，具有可变传感器可用性的家庭中的辅助机器人）。代码可在https://github.com/Schindler-EPFL-Lab/D-CAT上获得。
摘要：Cross-modal transfer learning is used to improve multi-modal classification models (e.g., for human activity recognition in human-robot collaboration). However, existing methods require paired sensor data at both training and inference, limiting deployment in resource-constrained environments where full sensor suites are not economically and technically usable. To address this, we propose Decoupled Cross-Attention Transfer (D-CAT), a framework that aligns modality-specific representations without requiring joint sensor modality during inference. Our approach combines a self-attention module for feature extraction with a novel cross-attention alignment loss, which enforces the alignment of sensors' feature spaces without requiring the coupling of the classification pipelines of both modalities. We evaluate D-CAT on three multi-modal human activity datasets (IMU, video, and audio) under both in-distribution and out-of-distribution scenarios, comparing against uni-modal models. Results show that in in-distribution scenarios, transferring from high-performing modalities (e.g., video to IMU) yields up to 10% F1-score gains over uni-modal training. In out-of-distribution scenarios, even weaker source modalities (e.g., IMU to video) improve target performance, as long as the target model isn't overfitted on the training data. By enabling single-sensor inference with cross-modal knowledge, D-CAT reduces hardware redundancy for perception systems while maintaining accuracy, which is critical for cost-sensitive or adaptive deployments (e.g., assistive robots in homes with variable sensor availability). Code is available at https://github.com/Schindler-EPFL-Lab/D-CAT.

【3】Error Analysis in a Modular Meeting Transcription System
标题：模块化会议转录系统中的错误分析
链接：https://arxiv.org/abs/2509.10143

作者：ting, Simon Berger, Thilo von Neumann, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach
备注：Accepted at ITG Conference on Speech Communication 2025
摘要：会议记录是近年来发展迅速的一个研究领域。然而，挑战依然存在，限制了其业绩。在这项工作中，我们扩展了以前提出的框架，用于分析泄漏语音分离与适当的敏感性时间局部性。我们发现，有显着的泄漏到交叉通道的地区，只有主扬声器是活跃的。同时，结果表明，这不会影响最终的性能，因为这些泄漏的部分在很大程度上被语音活动检测（VAD）忽略。此外，不同的分割进行了比较，表明先进的日记化方法能够减少的差距，甲骨文分割的三分之一相比，一个简单的基于能量的VAD。我们还揭示了哪些因素导致了剩余的差异。这些结果代表了LibriCSS在仅对LibriSpeech数据进行训练识别模块的系统中的最新性能。
摘要：Meeting transcription is a field of high relevance and remarkable progress in recent years. Still, challenges remain that limit its performance. In this work, we extend a previously proposed framework for analyzing leakage in speech separation with proper sensitivity to temporal locality. We show that there is significant leakage to the cross channel in areas where only the primary speaker is active. At the same time, the results demonstrate that this does not affect the final performance much as these leaked parts are largely ignored by the voice activity detection (VAD). Furthermore, different segmentations are compared showing that advanced diarization approaches are able to reduce the gap to oracle segmentation by a third compared to a simple energy-based VAD. We additionally reveal what factors contribute to the remaining difference. The results represent state-of-the-art performance on LibriCSS among systems that train the recognition module on LibriSpeech data only.

检测相关(4篇)

【1】Investigating Feature Attribution for 5G Network Intrusion Detection
标题：调查5G网络入侵检测的特征属性
链接：https://arxiv.org/abs/2509.10206

作者：Uccello, Simin Nadjm-Tehrani
摘要：随着第五代（5G）网络在关键应用中的兴起，迫切需要从检测恶意活动转向能够提供适合缓解的可靠判决的系统。在这方面，理解和解释机器学习（ML）模型的安全警报对于实现可操作的事件响应编排至关重要。可解释的人工智能（XAI）技术有望通过提供对警报发出原因的深入了解来增强信任。一个占主导地位的方法在统计上关联的特征集，可以相关到一个给定的警报。本文首先质疑这种属性是否与下一代通信系统相关，并与基于逻辑解释的方法相比，研究其优点。我们广泛研究了SHAP和VoTE-XAI这两种方法，通过分析它们对XGBoost模型在三种不同的5G通信攻击用例中生成的警报的解释。我们确定了评估解释的三个指标：稀疏性，它们有多简洁;稳定性，它们在来自相同攻击类型的样本中的一致性;以及效率，解释生成的速度。例如，在具有92个功能的5G网络中，VoTE-XAI认为其中6个对于拒绝服务（DoS）变体ICMPFlood很重要，而SHAP则确定了20多个。更重要的是，我们发现SHAP和VoTE-XAI选择的特征之间存在显着差异。然而，没有一个由SHAP选择的排名靠前的功能被VoTE-XAI错过。当涉及到提供解释的效率时，我们发现VoTE-XAI的响应速度更快，例如，它在高维设置（478个特征）中在0.002秒内提供了一个解释。
摘要：With the rise of fifth-generation (5G) networks in critical applications, it is urgent to move from detection of malicious activity to systems capable of providing a reliable verdict suitable for mitigation. In this regard, understanding and interpreting machine learning (ML) models' security alerts is crucial for enabling actionable incident response orchestration. Explainable Artificial Intelligence (XAI) techniques are expected to enhance trust by providing insights into why alerts are raised. A dominant approach statistically associates feature sets that can be correlated to a given alert. This paper starts by questioning whether such attribution is relevant for future generation communication systems, and investigates its merits in comparison with an approach based on logical explanations. We extensively study two methods, SHAP and VoTE-XAI, by analyzing their interpretations of alerts generated by an XGBoost model in three different use cases with several 5G communication attacks. We identify three metrics for assessing explanations: sparsity, how concise they are; stability, how consistent they are across samples from the same attack type; and efficiency, how fast an explanation is generated. As an example, in a 5G network with 92 features, 6 were deemed important by VoTE-XAI for a Denial of Service (DoS) variant, ICMPFlood, while SHAP identified over 20. More importantly, we found a significant divergence between features selected by SHAP and VoTE-XAI. However, none of the top-ranked features selected by SHAP were missed by VoTE-XAI. When it comes to efficiency of providing interpretations, we found that VoTE-XAI is significantly more responsive, e.g. it provides a single explanation in under 0.002 seconds, in a high-dimensional setting (478 features).

【2】WAVE-DETR Multi-Modal Visible and Acoustic Real-Life Drone Detector
标题：WAVE-DETR多模式可见和声真实无人机探测器
链接：https://arxiv.org/abs/2509.09859

作者：efanescu, Ethan Oh, Ruben Vazquez, Chris Mesterharm, Constantin Serban, Ritu Chadha
备注：11 pages, 11 figures
摘要：我们介绍了一种多模态WAVE-DETR无人机检测器，它结合了可见RGB和声学信号，用于可靠的真实无人机目标检测。我们的方法将视觉和声学特征融合在一个统一的对象检测器模型中，该模型依赖于可变形DETR和Wav 2 Vec 2架构，在具有挑战性的环境条件下实现了强大的性能。我们的工作利用了现有的Drone-vs-Bird数据集和新生成的ARDrone数据集，其中包含超过7，500个同步图像和音频片段。我们展示了如何使用声学信息来提高真实ARDrone数据集上的可变形DETR对象检测器的性能。我们开发，训练和测试了四种不同的融合配置的基础上门控机制，线性层，MLP和交叉注意。Wav 2 Vec 2声学嵌入与可变形DETR的多分辨率特征映射相融合，并增强了所有无人机维度上的目标检测性能。表现最好的是门控融合方法，它将可变形DETR物体检测器在我们的分布内和分布外ARDrone数据集上的mAP提高了11.1%到15.3%，适用于小型无人机，所有IoU阈值在0.5到0.9之间。中型和大型无人机的mAP得分也有所提高，所有无人机尺寸的总体收益从3.27%到5.84%不等。
摘要：We introduce a multi-modal WAVE-DETR drone detector combining visible RGB and acoustic signals for robust real-life UAV object detection. Our approach fuses visual and acoustic features in a unified object detector model relying on the Deformable DETR and Wav2Vec2 architectures, achieving strong performance under challenging environmental conditions. Our work leverage the existing Drone-vs-Bird dataset and the newly generated ARDrone dataset containing more than 7,500 synchronized images and audio segments. We show how the acoustic information is used to improve the performance of the Deformable DETR object detector on the real ARDrone dataset. We developed, trained and tested four different fusion configurations based on a gated mechanism, linear layer, MLP and cross attention. The Wav2Vec2 acoustic embeddings are fused with the multi resolution feature mappings of the Deformable DETR and enhance the object detection performance over all drones dimensions. The best performer is the gated fusion approach, which improves the mAP of the Deformable DETR object detector on our in-distribution and out-of-distribution ARDrone datasets by 11.1% to 15.3% for small drones across all IoU thresholds between 0.5 and 0.9. The mAP scores for medium and large drones are also enhanced, with overall gains across all drone sizes ranging from 3.27% to 5.84%.

【3】Early Detection of Visual Impairments at Home Using a Smartphone Red-Eye Reflex Test
标题：使用智能手机红眼反射测试早期检测家中视觉障碍
链接：https://arxiv.org/abs/2509.09808

作者：ssmann, Alexander Lichtenstein, Francisco M. López
备注：Accepted at IEEE ICDL 2025. 6 pages, 7 figures, 2 tables
摘要：许多视觉障碍可以在幼儿的红眼反射图像中检测到。所谓的布鲁克纳试验传统上由眼科医生在临床环境中进行。由于智能手机和人工智能的最新技术进步，现在可以使用移动终端重现布鲁克纳测试。在本文中，我们提出了第一个研究过程中进行的开发KidsVisionCheck，一个免费的应用程序，可以执行视力筛查与移动终端使用红眼反射图像。底层模型依赖于深度神经网络，这些网络是根据眼科医生收集和标记的儿童瞳孔图像进行训练的。我们的模型在看不见的测试数据上具有90%的准确性，无需专业设备即可提供高度可靠的性能。此外，我们可以确定数据收集的最佳条件，这些条件反过来又可以用于向用户提供即时反馈。总之，这项工作标志着全球范围内儿童视力筛查和视力异常早期干预的第一步。
摘要：Numerous visual impairments can be detected in red-eye reflex images from young children. The so-called Bruckner test is traditionally performed by ophthalmologists in clinical settings. Thanks to the recent technological advances in smartphones and artificial intelligence, it is now possible to recreate the Bruckner test using a mobile device. In this paper, we present a first study conducted during the development of KidsVisionCheck, a free application that can perform vision screening with a mobile device using red-eye reflex images. The underlying model relies on deep neural networks trained on children's pupil images collected and labeled by an ophthalmologist. With an accuracy of 90% on unseen test data, our model provides highly reliable performance without the necessity of specialist equipment. Furthermore, we can identify the optimal conditions for data collection, which can in turn be used to provide immediate feedback to the users. In summary, this work marks a first step toward accessible pediatric vision screenings and early intervention for vision abnormalities worldwide.

【4】Drone-Based Multispectral Imaging and Deep Learning for Timely Detection of Branched Broomrape in Tomato Farms
标题：基于无人机的多光谱成像和深度学习用于及时检测番茄农场中的分支扫帚
链接：https://arxiv.org/abs/2509.09972

作者：eza Narimani, Alireza Pourreza, Ali Moghimi, Mohsen Mesgaran, Parastoo Farajpoor, Hamid Jafarbiglu
备注：Author-accepted version (no publisher header/footer). 10 pages + presentation. Published in Proceedings of SPIE Defense + Commercial Sensing 2024, Vol. 13053, Paper 1305304. Event: National Harbor, Maryland, USA. Official version: https://doi.org/10.1117/12.3021219
摘要：这项研究解决了分枝列当（学名：Acropanche ramosa）对加州番茄产业不断升级的威胁，该产业供应美国90%以上的加工番茄。这种寄生虫的主要地下生命周期使早期检测变得困难，而传统的化学控制成本高，对环境有害，而且往往无效。为了解决这个问题，我们将基于无人机的多光谱图像与长短期记忆（LSTM）深度学习网络相结合，使用合成少数过采样技术（SMOTE）来处理类别不平衡。研究是在加利福尼亚州尤洛县伍德兰的一个已知的遭受扫帚草侵染的番茄农场进行的，跨越了由生长度日（GDD）确定的五个关键生长阶段。对多光谱图像进行处理，以分离番茄冠层反射率。在897 GDD时，可以检测到79.09%的总体准确率和70.36%的召回率，而无需整合后期阶段。用LSTM来描述连续的生长阶段，大大提高了检测能力。表现最好的场景，将所有生长阶段与SMOTE增强相结合，实现了88.37%的整体准确率和95.37%的召回率。这些结果证明了时间多光谱分析和LSTM网络在早期肉苁蓉检测中的强大潜力。虽然实际部署需要进一步的真实世界数据收集，但这项研究表明，基于无人机的多光谱传感与深度学习相结合，可以提供一种强大的精准农业工具，以减少损失并提高番茄生产的可持续性。
摘要：This study addresses the escalating threat of branched broomrape (Phelipanche ramosa) to California's tomato industry, which supplies over 90 percent of U.S. processing tomatoes. The parasite's largely underground life cycle makes early detection difficult, while conventional chemical controls are costly, environmentally harmful, and often ineffective. To address this, we combined drone-based multispectral imagery with Long Short-Term Memory (LSTM) deep learning networks, using the Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance. Research was conducted on a known broomrape-infested tomato farm in Woodland, Yolo County, CA, across five key growth stages determined by growing degree days (GDD). Multispectral images were processed to isolate tomato canopy reflectance. At 897 GDD, broomrape could be detected with 79.09 percent overall accuracy and 70.36 percent recall without integrating later stages. Incorporating sequential growth stages with LSTM improved detection substantially. The best-performing scenario, which integrated all growth stages with SMOTE augmentation, achieved 88.37 percent overall accuracy and 95.37 percent recall. These results demonstrate the strong potential of temporal multispectral analysis and LSTM networks for early broomrape detection. While further real-world data collection is needed for practical deployment, this study shows that UAV-based multispectral sensing coupled with deep learning could provide a powerful precision agriculture tool to reduce losses and improve sustainability in tomato production.

表征(2篇)

【1】Sparse Coding Representation of 2-way Data
标题：双向数据的稀疏编码表示
链接：https://arxiv.org/abs/2509.10033

作者：Abram Magner, Maxwell McNeil, Petko Bogdanov
摘要：稀疏字典编码将信号表示为几个字典原子的线性组合。它已被应用于图像，时间序列，图形信号和多路时空数据，通过联合使用时间和空间字典。与数据无关的分析字典，如离散傅立叶变换，小波和图形傅立叶，由于高效的实现和良好的实际性能，已被广泛采用。另一方面，从数据中学习的字典提供了更稀疏和更准确的解决方案，但需要学习字典和编码系数。这对于多字典场景变得尤其具有挑战性，因为编码系数对应于来自字典的所有原子组合。为了解决这个问题，我们提出了一个低秩编码模型的2字典的情况下，并研究其数据复杂度。也就是说，我们建立了学习字典所需的样本数量的界限，这些字典可以概括为来自相同分布的未见过的样本。我们提出了一个凸松弛的解决方案，称为AODL，其精确的解决方案，我们也解决了原来的问题。然后，我们通过稀疏编码矩阵和学习字典之间的交替优化来解决这种松弛，我们证明了这是收敛的。我们证明了其质量的数据重建和缺失值填补在合成和真实世界的数据集。对于固定的重建质量，与非低秩和分析（固定）字典基线相比，AODL学习高达90%的稀疏解决方案。此外，学习的字典揭示了对用于训练的样本中存在的模式的可解释的见解。
摘要：Sparse dictionary coding represents signals as linear combinations of a few dictionary atoms. It has been applied to images, time series, graph signals and multi-way spatio-temporal data by jointly employing temporal and spatial dictionaries. Data-agnostic analytical dictionaries, such as the discrete Fourier transform, wavelets and graph Fourier, have seen wide adoption due to efficient implementations and good practical performance. On the other hand, dictionaries learned from data offer sparser and more accurate solutions but require learning of both the dictionaries and the coding coefficients. This becomes especially challenging for multi-dictionary scenarios since encoding coefficients correspond to all atom combinations from the dictionaries. To address this challenge, we propose a low-rank coding model for 2-dictionary scenarios and study its data complexity. Namely, we establish a bound on the number of samples needed to learn dictionaries that generalize to unseen samples from the same distribution. We propose a convex relaxation solution, called AODL, whose exact solution we show also solves the original problem. We then solve this relaxation via alternating optimization between the sparse coding matrices and the learned dictionaries, which we prove to be convergent. We demonstrate its quality for data reconstruction and missing value imputation in both synthetic and real-world datasets. For a fixed reconstruction quality, AODL learns up to 90\% sparser solutions compared to non-low-rank and analytical (fixed) dictionary baselines. In addition, the learned dictionaries reveal interpretable insights into patterns present within the samples used for training.

【2】CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
标题：CoDiCodec：统一音频的连续和离散压缩表示
链接：https://arxiv.org/abs/2509.09836

作者：ini, Stefan Lattner, George Fazekas
备注：Accepted to ISMIR 2025
摘要：在压缩的潜在空间中有效地表示音频信号对于潜在生成建模是至关重要的。然而，现有的自动编码器通常强制在连续嵌入和离散标记之间进行选择。此外，在保持音频保真度的同时实现高压缩比仍然是一个挑战。我们介绍了一种新型的音频自动编码器，它克服了这些限制，既通过摘要嵌入有效地编码全局特征，又通过从相同的训练模型中以2.38 kbps的速率产生约11 Hz的压缩连续嵌入和离散令牌，为不同的下游生成任务提供了前所未有的灵活性。这是通过有限标量量化（FSQ）和一种新的FSQ-丢弃技术实现的，并且除了用于端到端训练的单一一致性损失之外，不需要额外的损失项。CoDiCodec支持自回归解码和新型并行解码策略，后者可实现卓越的音频质量和更快的解码速度。在重建音频质量方面，CoDiCodec在类似比特率下优于现有的连续和离散自动编码器。我们的工作实现了音频压缩的统一方法，弥合了连续和离散生成建模范式之间的差距。
摘要：Efficiently representing audio signals in a compressed latent space is critical for latent generative modelling. However, existing autoencoders often force a choice between continuous embeddings and discrete tokens. Furthermore, achieving high compression ratios while maintaining audio fidelity remains a challenge. We introduce CoDiCodec, a novel audio autoencoder that overcomes these limitations by both efficiently encoding global features via summary embeddings, and by producing both compressed continuous embeddings at ~ 11 Hz and discrete tokens at a rate of 2.38 kbps from the same trained model, offering unprecedented flexibility for different downstream generative tasks. This is achieved through Finite Scalar Quantization (FSQ) and a novel FSQ-dropout technique, and does not require additional loss terms beyond the single consistency loss used for end-to-end training. CoDiCodec supports both autoregressive decoding and a novel parallel decoding strategy, with the latter achieving superior audio quality and faster decoding. CoDiCodec outperforms existing continuous and discrete autoencoders at similar bitrates in terms of reconstruction audio quality. Our work enables a unified approach to audio compression, bridging the gap between continuous and discrete generative modelling paradigms.

3D|3D重建等相关(2篇)

【1】P3D: Scalable Neural Surrogates for High-Resolution 3D Physics Simulations with Global Context
标题：P3 D：具有全球背景的高分辨率3D物理模拟的可扩展神经代理
链接：https://arxiv.org/abs/2509.10186

作者：Holzschuh, Georg Kohl, Florian Redinger, Nils Thuerey
摘要：我们提出了一个可扩展的框架，用于学习高分辨率3D物理模拟的确定性和概率神经代理。我们引入了一种针对3D物理模拟的混合CNN-Transformer骨干架构，该架构在速度和准确性方面明显优于现有架构。我们提出的网络可以在模拟域的小块区域上进行预训练，这些小块区域可以被融合以获得全局解决方案，可选地通过快速和可扩展的序列到序列模型进行指导，以包括远程依赖关系。这种设置允许训练大规模模型，降低高分辨率数据集的内存和计算要求。我们评估了我们的骨干架构对一个大的基线方法，目的是同时学习动态的14种不同类型的PDE在3D。我们演示了如何将我们的模型扩展到空间分辨率高达512^3 $的高分辨率各向同性湍流。最后，我们证明了我们的网络的多功能性，通过训练它作为一个扩散模型，以产生不同雷诺数的高度湍流的3D通道流的概率样本，准确地捕捉底层的流量统计。
摘要：We present a scalable framework for learning deterministic and probabilistic neural surrogates for high-resolution 3D physics simulations. We introduce a hybrid CNN-Transformer backbone architecture targeted for 3D physics simulations, which significantly outperforms existing architectures in terms of speed and accuracy. Our proposed network can be pretrained on small patches of the simulation domain, which can be fused to obtain a global solution, optionally guided via a fast and scalable sequence-to-sequence model to include long-range dependencies. This setup allows for training large-scale models with reduced memory and compute requirements for high-resolution datasets. We evaluate our backbone architecture against a large set of baseline methods with the objective to simultaneously learn the dynamics of 14 different types of PDEs in 3D. We demonstrate how to scale our model to high-resolution isotropic turbulence with spatial resolutions of up to $512^3$. Finally, we demonstrate the versatility of our network by training it as a diffusion model to produce probabilistic samples of highly turbulent 3D channel flows across varying Reynolds numbers, accurately capturing the underlying flow statistics.

【2】Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators
标题：使用端到端物理感知神经操作符加速3D光声计算机断层扫描
链接：https://arxiv.org/abs/2509.09894

作者：ng, Yousuf Aborahama, Arya Khokhar, Yang Zhang, Chuwei Wang, Karteekeya Sastry, Julius Berner, Yilin Luo, Boris Bonev, Zongyi Li, Kamyar Azizzadenesheli, Lihong V. Wang, Anima Anandkumar
摘要：光声计算机断层扫描（PACT）结合了光学对比度和超声分辨率，实现了超越光学扩散极限的深层组织成像。虽然三维PACT系统能够实现跨经颅到乳腺成像的应用的高分辨率体积成像，但当前的实现需要密集的换能器阵列和延长的采集时间，从而限制了临床转化。我们引入Pano（PACT成像神经操作符），这是一种端到端的物理感知模型，可以直接学习从传感器测量到体积重建的逆声学映射。与现有的方法（例如通用反投影算法）不同，Pano学习物理和数据先验，同时也不知道输入数据的分辨率。Pano采用球面离散-连续卷积来保持半球形传感器的几何形状，采用亥姆霍兹方程约束来确保物理一致性，并在不同的传感器配置中独立操作分辨率。我们证明了Pano在从模拟和真实实验数据重建高质量图像方面的鲁棒性和效率，即使在显著减少换能器数量和有限角度采集配置的情况下也能实现一致的性能。该框架在实现实时体积成像功能的同时，保持了不同稀疏采样模式的重建保真度。这一进步为使3D PACT在临床前研究和临床应用中更容易获得和可行建立了一条实用的途径，在不影响图像重建质量的情况下大大降低了硬件要求。
摘要：Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit. While three-dimensional PACT systems enable high-resolution volumetric imaging for applications spanning transcranial to breast imaging, current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation. We introduce Pano (PACT imaging neural operator), an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions. Unlike existing approaches (e.g. universal back-projection algorithm), Pano learns both physics and data priors while also being agnostic to the input data resolution. Pano employs spherical discrete-continuous convolutions to preserve hemispherical sensor geometry, incorporates Helmholtz equation constraints to ensure physical consistency and operates resolutionindependently across varying sensor configurations. We demonstrate the robustness and efficiency of Pano in reconstructing high-quality images from both simulated and real experimental data, achieving consistent performance even with significantly reduced transducer counts and limited-angle acquisition configurations. The framework maintains reconstruction fidelity across diverse sparse sampling patterns while enabling real-time volumetric imaging capabilities. This advancement establishes a practical pathway for making 3D PACT more accessible and feasible for both preclinical research and clinical applications, substantially reducing hardware requirements without compromising image reconstruction quality.

编码器(2篇)

【1】Intrinsic Dimension Estimating Autoencoder (IDEA) Using CancelOut Layer and a Projected Loss
标题：使用CancelOut层和预计损失的内在维估计自动编码器（IDEA）
链接：https://arxiv.org/abs/2509.10011

作者：rioua, Philipp Krah, Julian Koellermeier
备注：Preprint with 12 pages and 12 figures
摘要：本文介绍了内在维数估计自动编码器（IDEA），它确定了广泛的数据集的样本位于线性或非线性流形上的潜在内在维数。除了估计内在维度之外，IDEA还能够在将原始数据集投影到相应的潜在空间之后重建原始数据集，该潜在空间使用重新加权的双CancelOut层进行结构化。我们的主要贡献是引入了投影重建损失项，通过在去除额外的潜在维度的情况下持续评估重建质量来指导模型的训练。我们首先评估的性能的IDEA的一系列理论基准，以验证其鲁棒性。这些实验使我们能够测试其重建能力，并比较其性能与国家的最先进的内在尺寸估计。基准测试表明，我们的方法具有良好的准确性和高通用性。随后，我们将我们的模型应用于从垂直分辨一维自由表面流的数值解产生的数据，在水平方向，垂直方向和时间上的垂直速度分布的逐点离散化。IDEA成功地估计了数据集的内在维度，然后通过直接在网络识别的投影空间内工作来重建原始解。
摘要：This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the projected reconstruction loss term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension. We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset's intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.

【2】A Multimodal RAG Framework for Housing Damage Assessment: Collaborative Optimization of Image Encoding and Policy Vector Retrieval
标题：住房损坏评估的多模式RAG框架：图像编码和政策载体检索的协同优化
链接：https://arxiv.org/abs/2509.09721

作者：o, Dingxin Lu, Zhuqi Wang
摘要：在自然灾害发生后，准确评估住房损失对于保险索赔反应和资源规划至关重要。在这项工作中，我们介绍了一种新的多模态检索增强生成（MM-RAG）框架。在经典的RAG架构之上，我们进一步设计了一个两分支的多模态编码器结构，图像分支采用ResNet和Transformer组成的视觉编码器提取灾后建筑物损坏的特征，文本分支利用BERT检索器对帖子和保险单进行文本矢量化，并构建可检索的恢复索引。为了实现跨模态语义对齐，该模型集成了跨模态交互模块，通过多头注意在图像和文本之间架起语义表征的桥梁。同时，在生成模块中，引入模态注意门控机制，动态控制视觉证据和文本先验信息在生成过程中的作用。整个框架采取端到端的训练方式，将比较损失、检索损失和生成损失结合起来形成多任务优化目标，实现协同学习中的图像理解和策略匹配。结果表明，在检索精度和损伤严重度的分类指数，其中的前1检索精度提高了9.6%。
摘要：After natural disasters, accurate evaluations of damage to housing are important for insurance claims response and planning of resources. In this work, we introduce a novel multimodal retrieval-augmented generation (MM-RAG) framework. On top of classical RAG architecture, we further the framework to devise a two-branch multimodal encoder structure that the image branch employs a visual encoder composed of ResNet and Transformer to extract the characteristic of building damage after disaster, and the text branch harnesses a BERT retriever for the text vectorization of posts as well as insurance policies and for the construction of a retrievable restoration index. To impose cross-modal semantic alignment, the model integrates a cross-modal interaction module to bridge the semantic representation between image and text via multi-head attention. Meanwhile, in the generation module, the introduced modal attention gating mechanism dynamically controls the role of visual evidence and text prior information during generation. The entire framework takes end-to-end training, and combines the comparison loss, the retrieval loss and the generation loss to form multi-task optimization objectives, and achieves image understanding and policy matching in collaborative learning. The results demonstrate superior performance in retrieval accuracy and classification index on damage severity, where the Top-1 retrieval accuracy has been improved by 9.6%.

优化|敛散性(2篇)

【1】Hadamard-Riemannian Optimization for Margin-Variance Ensemble
标题：边际方差积分的Hadamard-Riemann优化
链接：https://arxiv.org/abs/2509.10189

作者：
摘要：包围学习已被广泛认为是通过组合多个基础模型来提高预测性能的关键技术。然而，传统的基于边缘的集成方法主要集中在最大化期望的边缘，而忽略了边缘方差的关键作用，这固有地限制了模型的泛化能力，并加剧了其过拟合的脆弱性，特别是在噪声或不平衡的数据集。此外，在概率单纯形内优化集成权重的传统方法通常会引入计算效率低下和可扩展性挑战，使其应用于大规模问题变得复杂。为了解决这些限制，本文介绍了一种新的集成学习框架，明确地将利润率方差的损失函数。我们的方法联合优化的负预期利润及其方差，从而增强了鲁棒性和提高泛化性能。此外，通过重新参数化的整体权重到单位球，我们大大简化了优化过程，提高了计算效率。在多个基准数据集上进行的大量实验表明，所提出的方法始终优于传统的基于边缘的集成技术，强调其有效性和实用性。
摘要：Ensemble learning has been widely recognized as a pivotal technique for boosting predictive performance by combining multiple base models. Nevertheless, conventional margin-based ensemble methods predominantly focus on maximizing the expected margin while neglecting the critical role of margin variance, which inherently restricts the generalization capability of the model and heightens its vulnerability to overfitting, particularly in noisy or imbalanced datasets. Additionally, the conventional approach of optimizing ensemble weights within the probability simplex often introduces computational inefficiency and scalability challenges, complicating its application to large-scale problems. To tackle these limitations, this paper introduces a novel ensemble learning framework that explicitly incorporates margin variance into the loss function. Our method jointly optimizes the negative expected margin and its variance, leading to enhanced robustness and improved generalization performance. Moreover, by reparameterizing the ensemble weights onto the unit sphere, we substantially simplify the optimization process and improve computational efficiency. Extensive experiments conducted on multiple benchmark datasets demonstrate that the proposed approach consistently outperforms traditional margin-based ensemble techniques, underscoring its effectiveness and practical utility.

【2】Generative Engine Optimization: How to Dominate AI Search
标题：生成式引擎优化：如何主导人工智能搜索
链接：https://arxiv.org/abs/2509.08919

作者：, Xiaoxuan Wang, Kaiwen Chen, Nick Koudas
摘要：ChatGPT、Perplexity和Gemini等生成式人工智能搜索引擎的快速采用从根本上重塑了信息检索，从传统的排名列表转向综合的、引用支持的答案。这种转变挑战了现有的搜索引擎优化（SEO）实践，并需要一个新的范式，我们称之为生成引擎优化（GEO）。本文对人工智能搜索和传统网络搜索（Google）进行了全面的比较分析。通过一系列跨多个垂直领域、语言和查询释义的大规模受控实验，我们量化了这些系统如何获取信息的关键差异。我们的主要研究结果显示，人工智能搜索对获得的媒体（第三方，权威来源）表现出系统性和压倒性的偏见，而不是品牌拥有的和社交内容，这与谷歌更平衡的组合形成鲜明对比。我们进一步证明了AI搜索服务在其领域多样性，新鲜度，跨语言稳定性和对措辞的敏感性方面存在显着差异。基于这些实证结果，我们制定了战略GEO议程。我们为从业者提供了可操作的指导，强调了以下关键需求：（1）设计机器可扫描性和合理性的内容，（2）主导获得的媒体以建立AI感知的权威，（3）采用特定于引擎和语言感知的策略，以及（4）克服利基玩家固有的“大品牌偏见”。我们的工作提供了基础的实证分析和战略框架，以实现在新的生成搜索格局的可见性。
摘要：The rapid adoption of generative AI-powered search engines like ChatGPT, Perplexity, and Gemini is fundamentally reshaping information retrieval, moving from traditional ranked lists to synthesized, citation-backed answers. This shift challenges established Search Engine Optimization (SEO) practices and necessitates a new paradigm, which we term Generative Engine Optimization (GEO). This paper presents a comprehensive comparative analysis of AI Search and traditional web search (Google). Through a series of large-scale, controlled experiments across multiple verticals, languages, and query paraphrases, we quantify critical differences in how these systems source information. Our key findings reveal that AI Search exhibit a systematic and overwhelming bias towards Earned media (third-party, authoritative sources) over Brand-owned and Social content, a stark contrast to Google's more balanced mix. We further demonstrate that AI Search services differ significantly from each other in their domain diversity, freshness, cross-language stability, and sensitivity to phrasing. Based on these empirical results, we formulate a strategic GEO agenda. We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification, (2) dominate earned media to build AI-perceived authority, (3) adopt engine-specific and language-aware strategies, and (4) overcome the inherent "big brand bias" for niche players. Our work provides the foundational empirical analysis and a strategic framework for achieving visibility in the new generative search landscape.

预测|估计(4篇)

【1】ARMA Block: A CNN-Based Autoregressive and Moving Average Module for Long-Term Time Series Forecasting
标题：ARMA Block：一个基于CNN的自回归和移动平均模块，用于长期时间序列预测
链接：https://arxiv.org/abs/2509.10324

作者： Kim, YeongHyeon Park, Il Dong Yun
摘要：本文提出了一种简单而有效的卷积模块，用于长期时间序列预测。受自回归积分移动平均（ARIMA）模型的启发，所提出的块由两个卷积分量组成：一个用于捕获趋势（自回归），另一个用于细化局部变化（移动平均）。与需要迭代多步预测的传统ARIMA不同，该模块直接执行多步预测，使其易于扩展到多变量设置。在九个广泛使用的基准数据集上的实验表明，我们的方法ARMA实现了有竞争力的准确性，特别是在表现出强烈趋势变化的数据集上，同时保持了体系结构的简单性。此外，分析表明，该块固有地编码绝对位置信息，这表明它有可能作为顺序模型中位置嵌入的轻量级替代品。
摘要：This paper proposes a simple yet effective convolutional module for long-term time series forecasting. The proposed block, inspired by the Auto-Regressive Integrated Moving Average (ARIMA) model, consists of two convolutional components: one for capturing the trend (autoregression) and the other for refining local variations (moving average). Unlike conventional ARIMA, which requires iterative multi-step forecasting, the block directly performs multi-step forecasting, making it easily extendable to multivariate settings. Experiments on nine widely used benchmark datasets demonstrate that our method ARMA achieves competitive accuracy, particularly on datasets exhibiting strong trend variations, while maintaining architectural simplicity. Furthermore, analysis shows that the block inherently encodes absolute positional information, suggesting its potential as a lightweight replacement for positional embeddings in sequential models.

【2】Predictive Spike Timing Enables Distributed Shortest Path Computation in Spiking Neural Networks
标题：预测尖峰时间实现尖峰神经网络中的分布式最短路径计算
链接：https://arxiv.org/abs/2509.10077

作者：resund, Kristian Valset Aars, Robin Dietrich, Nicolai Waniek
摘要：有效的规划和序列选择是智能的核心，但目前的方法在很大程度上仍然与生物计算不兼容。像Dijkstra或A* 这样的经典图算法需要全局状态和生物学上难以置信的操作，如回溯，而强化学习方法依赖于缓慢的基于梯度的策略更新，这与自然系统中观察到的快速行为适应不一致。我们提出了一个生物学上合理的最短路径计算算法，通过本地尖峰为基础的消息传递与现实的处理延迟。该算法利用尖峰时间巧合来识别最佳路径上的节点：比预测更早接收到兴奋性-兴奋性消息对的神经元减少了它们的响应延迟，从而创建了从目标向后传播到源的时间压缩。通过分析证明和随机空间网络上的模拟，我们证明了该算法的收敛性，并发现所有最短路径使用纯粹的基于时间的机制。通过展示短期时间动态如何单独计算最短路径，这项工作为生物网络如何通过纯粹的局部计算和相对尖峰时间预测解决复杂的计算问题提供了新的见解。这些发现为理解生物和人工系统中的分布式计算开辟了新的方向，可能对计算神经科学，人工智能，强化学习和神经形态系统产生影响。
摘要：Efficient planning and sequence selection are central to intelligence, yet current approaches remain largely incompatible with biological computation. Classical graph algorithms like Dijkstra's or A* require global state and biologically implausible operations such as backtracing, while reinforcement learning methods rely on slow gradient-based policy updates that appear inconsistent with rapid behavioral adaptation observed in natural systems. We propose a biologically plausible algorithm for shortest-path computation that operates through local spike-based message-passing with realistic processing delays. The algorithm exploits spike-timing coincidences to identify nodes on optimal paths: Neurons that receive inhibitory-excitatory message pairs earlier than predicted reduce their response delays, creating a temporal compression that propagates backwards from target to source. Through analytical proof and simulations on random spatial networks, we demonstrate that the algorithm converges and discovers all shortest paths using purely timing-based mechanisms. By showing how short-term timing dynamics alone can compute shortest paths, this work provides new insights into how biological networks might solve complex computational problems through purely local computation and relative spike-time prediction. These findings open new directions for understanding distributed computation in biological and artificial systems, with possible implications for computational neuroscience, AI, reinforcement learning, and neuromorphic systems.

【3】Data-Driven Energy Estimation for Virtual Servers Using Combined System Metrics and Machine Learning
标题：使用结合系统调试和机器学习的虚拟服务器的数据驱动能量估计
链接：https://arxiv.org/abs/2509.09991

作者：angha
摘要：本文提出了一种基于机器学习的方法来估计虚拟服务器的能耗，而无需访问物理功率测量接口。使用从客户虚拟机收集的资源利用率指标，我们训练梯度提升回归器来预测通过主机上的RAPL测量的能耗。我们证明，第一次，客人只有资源为基础的能源估计没有特权的主机访问不同的工作负载的实验，实现高预测精度和方差解释（0.90\leq R^2 \leq 0.97$），表明客人端能源估计的可行性。这种方法可以在虚拟化环境中实现能量感知调度、成本优化和独立于物理主机的能量估计。我们的方法解决了虚拟化环境（例如云）中的一个关键差距，其中直接的能量测量是不可行的。
摘要：This paper presents a machine learning-based approach to estimate the energy consumption of virtual servers without access to physical power measurement interfaces. Using resource utilization metrics collected from guest virtual machines, we train a Gradient Boosting Regressor to predict energy consumption measured via RAPL on the host. We demonstrate, for the first time, guest-only resource-based energy estimation without privileged host access with experiments across diverse workloads, achieving high predictive accuracy and variance explained ($0.90 \leq R^2 \leq 0.97$), indicating the feasibility of guest-side energy estimation. This approach can enable energy-aware scheduling, cost optimization and physical host independent energy estimates in virtualized environments. Our approach addresses a critical gap in virtualized environments (e.g. cloud) where direct energy measurement is infeasible.

【4】DCHO: A Decomposition-Composition Framework for Predicting Higher-Order Brain Connectivity to Enhance Diverse Downstream Applications
标题：DCHO：一个用于预测更高级大脑连接性以增强多元化下游应用的分解-合成框架
链接：https://arxiv.org/abs/2509.09696

作者：, Wendu Li, Quanying Liu
摘要：高阶脑连接（HOBC），它捕捉三个或更多的大脑区域之间的相互作用，提供了比传统的成对功能连接（FC）更丰富的组织信息。最近的研究已经开始从非侵入性成像数据推断潜在的HOBC，但它们主要集中在静态分析，限制了它们在动态预测任务中的适用性。为了解决这一差距，我们提出了DCHO，一个统一的方法，用于建模和预测的时间演变的HOBC的分解组合框架的基础上，这是适用于非预测任务（状态分类）和预测任务（脑动力学预测）。DCHO采用分解-组合策略，将预测任务重新表示为两个可管理的子问题：HOBC推理和潜在轨迹预测。在推理阶段，我们提出了一个双视图编码器来提取多尺度拓扑特征和潜在的组合学习器来捕获高层次的HOBC信息。在预测阶段，我们引入了潜在的空间预测损失，以加强时间轨迹的建模。在多个神经成像数据集上进行的大量实验表明，DCHO在非预测性任务（状态分类）和预测性任务（脑动力学预测）中都取得了优异的性能，显著优于现有方法。
摘要：Higher-order brain connectivity (HOBC), which captures interactions among three or more brain regions, provides richer organizational information than traditional pairwise functional connectivity (FC). Recent studies have begun to infer latent HOBC from noninvasive imaging data, but they mainly focus on static analyses, limiting their applicability in dynamic prediction tasks. To address this gap, we propose DCHO, a unified approach for modeling and forecasting the temporal evolution of HOBC based on a Decomposition-Composition framework, which is applicable to both non-predictive tasks (state classification) and predictive tasks (brain dynamics forecasting). DCHO adopts a decomposition-composition strategy that reformulates the prediction task into two manageable subproblems: HOBC inference and latent trajectory prediction. In the inference stage, we propose a dual-view encoder to extract multiscale topological features and a latent combinatorial learner to capture high-level HOBC information. In the forecasting stage, we introduce a latent-space prediction loss to enhance the modeling of temporal trajectories. Extensive experiments on multiple neuroimaging datasets demonstrate that DCHO achieves superior performance in both non-predictive tasks (state classification) and predictive tasks (brain dynamics forecasting), significantly outperforming existing methods.

其他神经网络|深度学习|模型|建模(8篇)

【1】Is In-Context Learning Learning?
标题：上下文学习是学习吗？
链接：https://arxiv.org/abs/2509.10414

作者： Wynter
备注：Director's cut
摘要：上下文学习（ICL）允许一些自回归模型通过下一个令牌预测来解决任务，而不需要进一步的训练。这导致了关于这些模型的能力的索赔解决（学习）看不见的任务，只有几个镜头（样本）的提示。然而，演绎并不总是意味着学习，因为ICL并不明确编码给定的观察。相反，这些模型依赖于它们的先验知识和给定的范例（如果有的话）。我们认为，从数学上讲，ICL确实构成了学习，但其完整的特征需要实证工作。然后，我们进行了一个大规模的分析ICL消融或占记忆，预训练，分布变化，提示风格和措辞。我们发现，ICL是一种有效的学习范式，但其学习和概括看不见的任务的能力有限。我们注意到，在样本变得越来越多的限制，准确性是不敏感的样本分布，模型，提示风格，和输入的语言特征。相反，它从提示中的重复推断出模式，这导致了分布敏感性，特别是在提示风格中，如思维链。鉴于形式上类似的任务不同的准确性，我们得出结论，自回归的特设编码不是一个强大的机制，并建议有限的通用性。
摘要：In-context learning (ICL) allows some autoregressive models to solve tasks via next-token prediction and without needing further training. This has led to claims about these model's ability to solve (learn) unseen tasks with only a few shots (exemplars) in the prompt. However, deduction does not always imply learning, as ICL does not explicitly encode a given observation. Instead, the models rely on their prior knowledge and the exemplars given, if any. We argue that, mathematically, ICL does constitute learning, but its full characterisation requires empirical work. We then carry out a large-scale analysis of ICL ablating out or accounting for memorisation, pretraining, distributional shifts, and prompting style and phrasing. We find that ICL is an effective learning paradigm, but limited in its ability to learn and generalise to unseen tasks. We note that, in the limit where exemplars become more numerous, accuracy is insensitive to exemplar distribution, model, prompt style, and the input's linguistic features. Instead, it deduces patterns from regularities in the prompt, which leads to distributional sensitivity, especially in prompting styles such as chain-of-thought. Given the varied accuracies on formally similar tasks, we conclude that autoregression's ad-hoc encoding is not a robust mechanism, and suggests limited all-purpose generalisability.

【2】Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms
标题：数据分布影响基于对比学习的心电图基础模型的性能和可推广性
链接：https://arxiv.org/abs/2509.10369

作者：Khattak, Konstantinos Patlatzoglou, Joseph Barker, Libor Pastika, Boroumand Zeidaabadi, Ahmed El-Medany, Hesham Aggour, Yixiu Liang, Antonio H. Ribeiro, Jeffrey Annis, Antonio Luiz Pinho Ribeiro, Junbo Ge, Daniel B. Kramer, Jonathan W. Waks, Evan Brittain, Nicholas Peters, Fu Siong Ng, Arunashis Sau
备注：Currently under review at npj Digital Medicine
摘要：对比学习是一种被广泛采用的自我监督预训练策略，但其对队列组成的依赖性仍有待研究。我们提出了患者增强心电图（CAPE）基础模型的对比，并对来自三大洲（北美，南美，亚洲）的不同人群的四个队列（n = 5，203，352）进行了预训练。我们系统地评估了队列人口统计学，健康状况和人口多样性如何影响预测任务的下游性能，还包括来自另一个大陆（欧洲）的另外两个队列。我们发现，下游性能取决于预训练队列的分布特性，包括人口统计学和健康状况。此外，虽然使用多中心，人口统计学上多样化的队列进行预训练可以提高分布准确性，但它通过编码特定于队列的伪影来减少我们对比方法的分布外（OOD）泛化。为了解决这个问题，我们提出了分发批处理（IDB）策略，该策略在预训练期间保持队列内的一致性，并增强了OOD的鲁棒性。这项工作为开发临床公平和可推广的基础模型提供了重要的见解。
摘要：Contrastive learning is a widely adopted self-supervised pretraining strategy, yet its dependence on cohort composition remains underexplored. We present Contrasting by Patient Augmented Electrocardiograms (CAPE) foundation model and pretrain on four cohorts (n = 5,203,352), from diverse populations across three continents (North America, South America, Asia). We systematically assess how cohort demographics, health status, and population diversity influence the downstream performance for prediction tasks also including two additional cohorts from another continent (Europe). We find that downstream performance depends on the distributional properties of the pretraining cohort, including demographics and health status. Moreover, while pretraining with a multi-centre, demographically diverse cohort improves in-distribution accuracy, it reduces out-of-distribution (OOD) generalisation of our contrastive approach by encoding cohort-specific artifacts. To address this, we propose the In-Distribution Batch (IDB) strategy, which preserves intra-cohort consistency during pretraining and enhances OOD robustness. This work provides important insights for developing clinically fair and generalisable foundation models.

【3】Physics-informed sensor coverage through structure preserving machine learning
标题：通过结构保留机器学习实现基于物理信息的传感器覆盖
链接：https://arxiv.org/abs/2509.10363

作者：David Shaffer, Brooks Kinch, Joseph Klobusicky, M. Ani Hsieh, Nathaniel Trask
摘要：我们提出了一个自适应源定位的机器学习框架，在该框架中，代理使用耦合的流体动力学运输系统的结构保持数字孪生模型进行实时轨迹规划和数据同化。双胞胎构造与条件神经惠特尼形式（CNWF），耦合的数值保证有限元外演算（FEEC）与基于变换的算子学习。由此产生的模型保持离散守恒，并实时适应流传感器数据。它采用了一个有条件的注意机制，以确定：减少Whitney形式的基础，减少积分平衡方程，和源字段，每个兼容给定的传感器测量。诱导降阶环境模型保留了标准有限元模拟的稳定性和一致性，产生了物理上可实现的，从传感器数据到源场的规则映射。我们提出了一个交错的计划，交替评估数字双胞胎和应用劳埃德算法来指导传感器的位置，与分析提供的条件单调改善的覆盖功能。使用预测的源场作为最优恢复方案内的重要性函数，我们证明了连续性假设下的点源恢复，突出了正则性作为局部化的充分条件的作用。与物理不可知的Transformer架构的实验比较表明，在复杂的几何形状时，物理约束的实施，提高了准确性，表明结构保存提供了一个有效的感应偏置源识别。
摘要：We present a machine learning framework for adaptive source localization in which agents use a structure-preserving digital twin of a coupled hydrodynamic-transport system for real-time trajectory planning and data assimilation. The twin is constructed with conditional neural Whitney forms (CNWF), coupling the numerical guarantees of finite element exterior calculus (FEEC) with transformer-based operator learning. The resulting model preserves discrete conservation, and adapts in real time to streaming sensor data. It employs a conditional attention mechanism to identify: a reduced Whitney-form basis; reduced integral balance equations; and a source field, each compatible with given sensor measurements. The induced reduced-order environmental model retains the stability and consistency of standard finite-element simulation, yielding a physically realizable, regular mapping from sensor data to the source field. We propose a staggered scheme that alternates between evaluating the digital twin and applying Lloyd's algorithm to guide sensor placement, with analysis providing conditions for monotone improvement of a coverage functional. Using the predicted source field as an importance function within an optimal-recovery scheme, we demonstrate recovery of point sources under continuity assumptions, highlighting the role of regularity as a sufficient condition for localization. Experimental comparisons with physics-agnostic transformer architectures show improved accuracy in complex geometries when physical constraints are enforced, indicating that structure preservation provides an effective inductive bias for source identification.

【4】A Certifiable Machine Learning-Based Pipeline to Predict Fatigue Life of Aircraft Structures
标题：基于可认证的机器学习的管道来预测飞机结构疲劳寿命
链接：https://arxiv.org/abs/2509.10227

作者：rón, Miguel Sánchez-Domínguez, Javier Rozalén, Fernando R. Sánchez, Javier de Vicente, Lucas Lacasa, Eusebio Valero, Gonzalo Rubio
备注：29 pages, 15 figures
摘要：疲劳寿命预测在任何飞机的设计和运行阶段都是必不可少的，从这个意义上说，航空航天工业的安全需要早期检测疲劳裂纹，以防止飞行中的故障。因此，可靠而精确的疲劳寿命预测器对于确保安全至关重要。传统的工程方法虽然可靠，但耗时且涉及复杂的工作流程，包括进行多个有限元法（FEM）模拟、推导预期载荷谱以及应用峰谷或雨流计数等循环计数技术等步骤。这些步骤通常需要多个团队和工具之间的协作，增加了实现疲劳寿命预测所需的计算时间和工作量。机器学习（ML）为传统的疲劳寿命估计方法提供了一个有希望的补充，可以实现更快的迭代和泛化，提供快速估计，与传统的模拟一起指导决策。在本文中，我们提出了一个ML为基础的管道，旨在估计不同的飞机机翼位置的疲劳寿命给定的不同任务的飞行参数，飞机将在其整个使用寿命。我们在疲劳寿命估计的实际用例中验证了管道，产生了准确的预测以及彻底的统计验证和不确定性量化。我们的管道构成了传统方法的补充，减少了昂贵的模拟量，从而降低了所需的计算和人力资源。
摘要：Fatigue life prediction is essential in both the design and operational phases of any aircraft, and in this sense safety in the aerospace industry requires early detection of fatigue cracks to prevent in-flight failures. Robust and precise fatigue life predictors are thus essential to ensure safety. Traditional engineering methods, while reliable, are time consuming and involve complex workflows, including steps such as conducting several Finite Element Method (FEM) simulations, deriving the expected loading spectrum, and applying cycle counting techniques like peak-valley or rainflow counting. These steps often require collaboration between multiple teams and tools, added to the computational time and effort required to achieve fatigue life predictions. Machine learning (ML) offers a promising complement to traditional fatigue life estimation methods, enabling faster iterations and generalization, providing quick estimates that guide decisions alongside conventional simulations. In this paper, we present a ML-based pipeline that aims to estimate the fatigue life of different aircraft wing locations given the flight parameters of the different missions that the aircraft will be operating throughout its operational life. We validate the pipeline in a realistic use case of fatigue life estimation, yielding accurate predictions alongside a thorough statistical validation and uncertainty quantification. Our pipeline constitutes a complement to traditional methodologies by reducing the amount of costly simulations and, thereby, lowering the required computational and human resources.

【5】Variational Neural Networks for Observable Thermodynamics (V-NOTS)
标题：可观测热力学的变分神经网络（V-NOTS）
链接：https://arxiv.org/abs/2509.09899

作者：er Eldred, François Gay-Balmaz, Vakhtang Putkaradze
备注：26 pages, 6 figures
摘要：基于数据的物理系统演化计算是近年来研究的热点。在这种方法中，来自相空间中的过去轨迹的数据点的信息被用来重建运动方程，并预测以前没有观察到的未来解。然而，在许多情况下，可用的数据并不对应于定义系统相空间的变量。我们把注意力集中在耗散动力系统的重要例子。在这种情况下，相空间由坐标、动量和熵组成;然而，动量和熵通常不能直接观察到。为了解决这个困难，我们开发了一个有效的基于数据的计算框架完全基于可观测变量，通过构建一个新的方法的基础上的\endash {热力学拉格朗日}，并构建神经网络，尊重热力学和保证非减熵演化。我们表明，我们的网络可以提供一个有效的描述相空间演化的基础上有限数量的数据点和系统中的参数相对较少。
摘要：Much attention has recently been devoted to data-based computing of evolution of physical systems. In such approaches, information about data points from past trajectories in phase space is used to reconstruct the equations of motion and to predict future solutions that have not been observed before. However, in many cases, the available data does not correspond to the variables that define the system's phase space. We focus our attention on the important example of dissipative dynamical systems. In that case, the phase space consists of coordinates, momenta and entropies; however, the momenta and entropies cannot, in general, be observed directly. To address this difficulty, we develop an efficient data-based computing framework based exclusively on observable variables, by constructing a novel approach based on the \emph{thermodynamic Lagrangian}, and constructing neural networks that respect the thermodynamics and guarantees the non-decreasing entropy evolution. We show that our network can provide an efficient description of phase space evolution based on a limited number of data points and a relatively small number of parameters in the system.

【6】World Modeling with Probabilistic Structure Integration
标题：概率结构集成的世界建模
链接：https://arxiv.org/abs/2509.09737

作者：tar, Wanhee Lee, Rahul Venkatesh, Honglin Chen, Daniel Bear, Jared Watrous, Simon Kim, Khai Loong Aw, Lilian Naing Chen, Stefan Stojanov, Kevin Feigelis, Imran Thobani, Alex Durango, Khaled Jedoui, Atlas Kazemian, Dan Yamins
摘要：我们提出了概率结构集成（PSI），一个系统，用于学习丰富的可控和灵活的可扩展的世界模型的数据。PSI由三步循环组成。第一步，概率预测，涉及以随机访问自回归序列模型的形式构建数据的概率图模型Psi。PSI支持一组完整的学习条件分布，描述数据中任何变量对任何其他变量集的依赖性。在第2步，结构提取，我们展示了如何提取数据中的底层低维属性，对应于一组不同的有意义的“中间结构”，在zero-shot的方式通过因果推理Psi。第三步，整合，通过将这些结构转换为新的标记类型来完成循环，然后将这些标记类型作为条件反射信号和预测目标不断混合回训练饮食中。每一个这样的循环都增强了Psi的能力，既允许它更好地对底层数据建模，又创建了新的控制句柄--类似于LLM的通用提示语言。我们在互联网视频数据的1.4万亿个令牌上训练一个Psi实例;我们用它来执行各种有用的视频预测和理解推理;我们提取最先进的光流，自监督深度和对象分割;我们使用这些结构来支持预测改进的完整周期。
摘要：We present Probabilistic Structure Integration (PSI), a system for learning richly controllable and flexibly promptable world models from data. PSI consists of a three-step cycle. The first step, Probabilistic prediction, involves building a probabilistic graphical model Psi of the data, in the form of a random-access autoregressive sequence model. Psi supports a complete set of learned conditional distributions describing the dependence of any variables in the data on any other set of variables. In step 2, Structure extraction, we show how to extract underlying low-dimensional properties in the data, corresponding to a diverse set of meaningful "intermediate structures", in a zero-shot fashion via causal inference on Psi. Step 3, Integration, completes the cycle by converting these structures into new token types that are then continually mixed back into the training diet as conditioning signals and prediction targets. Each such cycle augments the capabilities of Psi, both allowing it to model the underlying data better, and creating new control handles -- akin to an LLM-like universal prompting language. We train an instance of Psi on 1.4 trillion tokens of internet video data; we use it to perform a variety of useful video prediction and understanding inferences; we extract state-of-the-art optical flow, self-supervised depth and object segmentation; and we use these structures to support a full cycle of predictive improvements.

【7】An Information-Theoretic Framework for Credit Risk Modeling: Unifying Industry Practice with Statistical Theory for Fair and Interpretable Scorecards
标题：信用风险建模的信息理论框架：将行业实践与统计理论统一起来，形成公平且可解释的记分卡
链接：https://arxiv.org/abs/2509.09855

作者：ianto, Denis Burakov
摘要：信用风险建模广泛依赖于证据权重（WoE）和信息价值（IV）的特征工程，以及人口稳定性指数（PSI）的漂移监测，但他们的理论基础仍然脱节。我们建立了一个统一的信息理论框架，揭示这些行业标准的度量标准的经典信息分歧的实例。具体来说，我们证明了IV完全等于PSI（杰弗里斯分歧）计算良好和不良的信用结果在相同的箱子。通过应用于WoE变换的delta方法，我们推导出IV和PSI的标准误差，首次实现了正式的假设检验和概率公平约束。我们正式信贷建模的内在性能公平的权衡，最大限度地提高IV的预测能力，同时最小化IV的保护属性。使用深度为1的XGBoost树桩的自动分箱，我们比较了三种编码策略：使用独热编码的逻辑回归，WoE变换和约束XGBoost。所有方法都实现了相当的预测性能（AUC 0.82-0.84），表明原则性的信息理论分箱优于编码选择。混合整数规划跟踪帕累托有效的解决方案，沿着性能公平边界与不确定性量化。该框架将理论与实践联系起来，为广泛使用的信用风险指标提供了第一个严格的统计基础，同时为在受监管的环境中平衡准确性和公平性提供了原则性工具。
摘要：Credit risk modeling relies extensively on Weight of Evidence (WoE) and Information Value (IV) for feature engineering, and Population Stability Index (PSI) for drift monitoring, yet their theoretical foundations remain disconnected. We establish a unified information-theoretic framework revealing these industry-standard metrics as instances of classical information divergences. Specifically, we prove that IV exactly equals PSI (Jeffreys divergence) computed between good and bad credit outcomes over identical bins. Through the delta method applied to WoE transformations, we derive standard errors for IV and PSI, enabling formal hypothesis testing and probabilistic fairness constraints for the first time. We formalize credit modeling's inherent performance-fairness trade-off as maximizing IV for predictive power while minimizing IV for protected attributes. Using automated binning with depth-1 XGBoost stumps, we compare three encoding strategies: logistic regression with one-hot encoding, WoE transformation, and constrained XGBoost. All methods achieve comparable predictive performance (AUC 0.82-0.84), demonstrating that principled, information-theoretic binning outweighs encoding choice. Mixed-integer programming traces Pareto-efficient solutions along the performance-fairness frontier with uncertainty quantification. This framework bridges theory and practice, providing the first rigorous statistical foundation for widely-used credit risk metrics while offering principled tools for balancing accuracy and fairness in regulated environments.

【8】Machine-learning competition to grade EEG background patterns in newborns with hypoxic-ischaemic encephalopathy
标题：对缺氧缺血性脑病新生儿的脑电背景模式进行评分的机器学习竞赛
链接：https://arxiv.org/abs/2509.09695

作者：arelli, Geraldine B. Boylan, Saeed Montazeri, Feargal O'Sullivan, Dominic Lightbody, Minoo Ashoori, Tamara Skoric Ceranic, John M. O'Toole
备注：29 pages, supplementary materials: "supplementary materials ML this http URL"
摘要：机器学习（ML）有可能支持和提高专家在监测高危新生儿大脑功能方面的表现。开发准确可靠的ML模型取决于对高质量注释数据的访问，这是一种短缺的资源。机器学习竞赛通过为研究人员提供专业注释的数据集，通过直接模型比较促进共享学习，并利用众包不同专业知识的好处来满足这一需求。我们编制了一个回顾性数据集，其中包含来自多中心研究的102名新生儿的353小时EEG。这些数据完全匿名，并分为训练、测试和验证数据集。EEG按异常背景模式的严重程度分级。接下来，我们创建了一个基于网络的竞赛平台，并举办了一场机器学习竞赛，以开发用于对新生儿EEG背景模式严重程度进行分类的ML模型。比赛结束后，前4名的表现模型在一个单独的验证数据集上进行了离线评估。虽然基于特征的模型在测试数据集上排名第一，但深度学习模型在验证集上的推广效果更好。与检测性能相比，所有方法的验证性能均显著下降。这突出了对看不见的数据进行模型概括的挑战，强调了在新生儿EEG的ML研究中需要保持验证数据集。该研究强调了在大型和多样化的数据集上训练ML模型以确保鲁棒泛化的重要性。竞赛结果表明，开放访问数据和协作机器学习开发具有潜力，可以营造协作研究环境，并加快新生儿神经监测临床决策支持工具的开发。
摘要：Machine learning (ML) has the potential to support and improve expert performance in monitoring the brain function of at-risk newborns. Developing accurate and reliable ML models depends on access to high-quality, annotated data, a resource in short supply. ML competitions address this need by providing researchers access to expertly annotated datasets, fostering shared learning through direct model comparisons, and leveraging the benefits of crowdsourcing diverse expertise. We compiled a retrospective dataset containing 353 hours of EEG from 102 individual newborns from a multi-centre study. The data was fully anonymised and divided into training, testing, and held-out validation datasets. EEGs were graded for the severity of abnormal background patterns. Next, we created a web-based competition platform and hosted a machine learning competition to develop ML models for classifying the severity of EEG background patterns in newborns. After the competition closed, the top 4 performing models were evaluated offline on a separate held-out validation dataset. Although a feature-based model ranked first on the testing dataset, deep learning models generalised better on the validation sets. All methods had a significant decline in validation performance compared to the testing performance. This highlights the challenges for model generalisation on unseen data, emphasising the need for held-out validation datasets in ML studies with neonatal EEG. The study underscores the importance of training ML models on large and diverse datasets to ensure robust generalisation. The competition's outcome demonstrates the potential for open-access data and collaborative ML development to foster a collaborative research environment and expedite the development of clinical decision-support tools for neonatal neuromonitoring.

其他(24篇)

【1】Run-Time Monitoring of ERTMS/ETCS Control Flow by Process Mining
标题：通过流程挖掘对ERTMS/ETCS控制流的运行时监控
链接：https://arxiv.org/abs/2509.10419

作者： Vitale, Tommaso Zoppi, Francesco Flammini, Nicola Mazzocca
备注：Accepted to the 6th International Conference on Reliability, Safety, and Security of Railway Systems (RSSRail2025)
摘要：由于计算机化铁路系统日益复杂和重要，确保这些系统的复原力对于应对不确定性和变化越来越重要。虽然他们的软件依赖于严格的验证和确认过程，遵循完善的最佳实践和认证标准，但由于设计时未知的残留故障，系统和环境修改或其他紧急网络威胁场景，在运行时仍然可能发生异常。本文探讨了运行时控制流异常检测使用过程挖掘，以提高弹性的ERTMS/ETCS L2（欧洲铁路交通管理系统/欧洲训练控制系统2级）。进程挖掘允许从系统的执行跟踪中学习系统的实际控制流，从而通过在线一致性检查实现运行时监控。此外，通过无监督机器学习进行异常定位，将相关偏差与关键系统组件联系起来。我们测试我们的方法上的参考ERTMS/ETCS L2的情况下，即红细胞/红细胞移交，以显示其检测和定位异常的能力，具有高准确性，效率和可解释性。
摘要：Ensuring the resilience of computer-based railways is increasingly crucial to account for uncertainties and changes due to the growing complexity and criticality of those systems. Although their software relies on strict verification and validation processes following well-established best-practices and certification standards, anomalies can still occur at run-time due to residual faults, system and environmental modifications that were unknown at design-time, or other emergent cyber-threat scenarios. This paper explores run-time control-flow anomaly detection using process mining to enhance the resilience of ERTMS/ETCS L2 (European Rail Traffic Management System / European Train Control System Level 2). Process mining allows learning the actual control flow of the system from its execution traces, thus enabling run-time monitoring through online conformance checking. In addition, anomaly localization is performed through unsupervised machine learning to link relevant deviations to critical system components. We test our approach on a reference ERTMS/ETCS L2 scenario, namely the RBC/RBC Handover, to show its capability to detect and localize anomalies with high accuracy, efficiency, and explainability.

【2】Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
标题：多极语义注意力：用于预训练的Softmax注意力的快速逼近
链接：https://arxiv.org/abs/2509.10406

作者：tchell, Kristian Kersting
摘要：我们提出了多极语义注意力（MuSe），一个有效的近似softmax注意力，结合语义聚类与多极扩展从计算物理。我们的方法通过在学习的表示空间中分别聚类查询和键来解决Transformers在上下文长度中的二次计算复杂性，从而实现分层两阶段注意机制。与以前的聚类方法，组密钥或使用统一的聚类，我们保持独立的聚类，尊重注意力的不对称处理这些空间。我们使用偶极校正来增强基于质心的近似，该偶极校正捕获集群内的方向方差，从而在训练期间保留更丰富的信息。该方法作为标准注意力的替代品，只需要超参数规范而无需架构修改。我们的方法实现了$\mathcal{O}（NCD）$复杂性的因果注意与$C$集群和$\mathcal{O}（NCD \log N）$的因果注意。在孤立的注意力层，我们证明了3\times $加速CUDNN闪存注意力在8 k的上下文长度，与相对平方误差低于20%。对于因果注意，我们开发了一个层次块分解，结合了精确的本地计算与有效的远程近似。在具有16 k上下文的书长文本上对30 M参数模型进行端到端预训练时，我们实现了12.2%的运行时间减少，而损失仅下降0.36%，从而确立了多极近似用于高效Transformer预训练的可行性。
摘要：We present Multipole Semantic Attention (MuSe), an efficient approximation of softmax attention that combines semantic clustering with multipole expansions from computational physics. Our method addresses the quadratic computational complexity of transformers in the context length by clustering queries and keys separately in their learned representation spaces, enabling a hierarchical two-stage attention mechanism. Unlike prior clustering approaches that group only keys or use unified clustering, we maintain separate clusterings that respect attention's asymmetric treatment of these spaces. We augment centroid-based (monopole) approximations with dipole corrections that capture directional variance within clusters, preserving richer information during training. The method operates as a drop-in replacement for standard attention, requiring only hyperparameter specification without architectural modifications. Our approach achieves $\mathcal{O}(NCD)$ complexity for acausal attention with $C$ clusters and $\mathcal{O}(NCD \log N)$ for causal attention. On isolated attention layers, we demonstrate $3\times$ speedup over CUDNN Flash Attention at 8k context length, with relative squared errors below 20%. For causal attention, we develop a hierarchical block decomposition that combines exact local computation with efficient long-range approximation. In end-to-end pretraining of a 30M parameter model on book-length texts with 16k context, we achieve 12.2% runtime reduction with only 0.36% loss degradation, establishing the viability of multipole approximations for efficient transformer pretraining.

【3】Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
标题：希尔BERT空间中的直线快速流动：功能性矫正流
链接：https://arxiv.org/abs/2509.10384

作者：hang, Clayton Scott
摘要：许多生成模型最初在有限维欧几里得空间中开发，在无限维设置中具有功能性推广。然而，整流流的无限维空间的扩展仍然没有探索。在这项工作中，我们建立了一个严格的功能制定的整流在无限维希尔伯特空间。我们的方法建立在无限维空间中的连续性方程的叠加原理。我们进一步表明，这个框架自然延伸到功能流匹配和功能概率流常微分方程，解释它们作为整流的非线性概括。值得注意的是，我们的扩展功能流匹配删除了限制性的测量理论假设在现有的理论\citet{kerrigan 2024 functional}。此外，我们通过实验证明，与现有的功能生成模型相比，我们的方法具有更好的性能。
摘要：Many generative models originally developed in finite-dimensional Euclidean space have functional generalizations in infinite-dimensional settings. However, the extension of rectified flow to infinite-dimensional spaces remains unexplored. In this work, we establish a rigorous functional formulation of rectified flow in an infinite-dimensional Hilbert space. Our approach builds upon the superposition principle for continuity equations in an infinite-dimensional space. We further show that this framework extends naturally to functional flow matching and functional probability flow ODEs, interpreting them as nonlinear generalizations of rectified flow. Notably, our extension to functional flow matching removes the restrictive measure-theoretic assumptions in the existing theory of \citet{kerrigan2024functional}. Furthermore, we demonstrate experimentally that our method achieves superior performance compared to existing functional generative models.

【4】Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
标题：描述分布式训练的效率：功率、性能和热量的角度
链接：https://arxiv.org/abs/2509.10371

作者：o, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan
摘要：大型语言模型（LLM）的快速扩展已经将训练工作量远远超出了单节点分析的限制，需要更深入地了解这些模型在大规模多GPU系统中的行为。在本文中，我们全面描述了LLM培训在各种现实工作负载和硬件平台（包括NVIDIA H100/H200和AMD MI250 GPU）上的特性。我们分析密集和稀疏模型下的各种并行策略-张量，管道，数据和专家-并评估其对硬件利用率，功耗和热行为的影响。我们进一步评估优化的有效性，如激活重新计算和计算通信重叠。我们的研究结果表明，性能不仅仅取决于扩展硬件容量。具有更少、更高内存GPU的纵向扩展系统在通信受限的情况下可以优于横向扩展系统，但只有在仔细调整配置的情况下;在其他情况下，横向扩展部署才能实现更高的吞吐量。我们还表明，某些并行组合，如张量与管道，导致带宽利用率不足，由于低效的数据分块，而增加微批量大小超过一定的点诱导突发执行和峰值功率偏移，恶化热节流。这些见解揭示了硬件、系统拓扑和模型执行之间的复杂交互如何塑造训练性能。最后，我们为系统和硬件设计提供建议，以提高未来LLM系统和工作负载的可扩展性和可靠性。这个项目的源代码可以在https://github.com/sitar-lab/CharLLM-PPT上找到。
摘要：The rapid scaling of Large Language Models (LLMs) has pushed training workloads far beyond the limits of single-node analysis, demanding a deeper understanding of how these models behave across large-scale, multi-GPU systems. In this paper, we present a comprehensive characterization of LLM training across diverse real-world workloads and hardware platforms, including NVIDIA H100/H200 and AMD MI250 GPUs. We analyze dense and sparse models under various parallelism strategies -- tensor, pipeline, data, and expert -- and evaluate their effects on hardware utilization, power consumption, and thermal behavior. We further evaluate the effectiveness of optimizations such as activation recomputation and compute-communication overlap. Our findings show that performance is not determined solely by scaling hardware capacity. Scale-up systems with fewer, higher-memory GPUs can outperform scale-out systems in communication-bound regimes, but only under carefully tuned configurations; in other cases, scale-out deployments achieve superior throughput. We also show that certain parallelism combinations, such as tensor with pipeline, lead to bandwidth underutilization due to inefficient data chunking, while increasing microbatch sizes beyond a certain point induces bursty execution and peak power excursions that worsen thermal throttling. These insights reveal how training performance is shaped by complex interactions between hardware, system topology, and model execution. We conclude by offering recommendations for system and hardware design to improve the scalability and reliability of future LLM systems and workloads. The source code of this project is available at https://github.com/sitar-lab/CharLLM-PPT.

【5】A Discrepancy-Based Perspective on Dataset Condensation
标题：基于差异的数据集压缩视角
链接：https://arxiv.org/abs/2509.10367

作者：, Raghavendra Selvan
备注：30 pages, 4 tables, 1 figure
摘要：Given a dataset of finitely many elements $\mathcal{T} = \{\mathbf{x}_i\}_{i = 1}^N$, the goal of dataset condensation (DC) is to construct a synthetic dataset $\mathcal{S} = \{\tilde{\mathbf{x}}_j\}_{j = 1}^M$ which is significantly smaller ($M \ll N$) such that a model trained from scratch on $\mathcal{S}$ achieves comparable or even superior generalization performance to a model trained on $\mathcal{T}$. Recent advances in DC reveal a close connection to the problem of approximating the data distribution represented by $\mathcal{T}$ with a reduced set of points. In this work, we present a unified framework that encompasses existing DC methods and extend the task-specific notion of DC to a more general and formal definition using notions of discrepancy, which quantify the distance between probability distribution in different regimes. Our framework broadens the objective of DC beyond generalization, accommodating additional objectives such as robustness, privacy, and other desirable properties.
摘要：Given a dataset of finitely many elements $\mathcal{T} = \{\mathbf{x}_i\}_{i = 1}^N$, the goal of dataset condensation (DC) is to construct a synthetic dataset $\mathcal{S} = \{\tilde{\mathbf{x}}_j\}_{j = 1}^M$ which is significantly smaller ($M \ll N$) such that a model trained from scratch on $\mathcal{S}$ achieves comparable or even superior generalization performance to a model trained on $\mathcal{T}$. Recent advances in DC reveal a close connection to the problem of approximating the data distribution represented by $\mathcal{T}$ with a reduced set of points. In this work, we present a unified framework that encompasses existing DC methods and extend the task-specific notion of DC to a more general and formal definition using notions of discrepancy, which quantify the distance between probability distribution in different regimes. Our framework broadens the objective of DC beyond generalization, accommodating additional objectives such as robustness, privacy, and other desirable properties.

【6】GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography
标题：GLAM：乳房X光检查中多视图VLP的几何引导局部对齐
链接：https://arxiv.org/abs/2509.10344

作者： Lihui Chen, Nicha C. Dvornek
备注：Accepted by MICCAI 2025
摘要：乳腺X线筛查是早期发现乳腺癌的重要工具。乳房X光检查解释的速度和准确性有可能通过深度学习方法得到提高。然而，一个基础的视觉语言模型（VLM）的发展受到自然和医学图像之间的有限的数据和领域的差异。现有的乳房X线摄影VLM，适应于自然图像，往往忽略域特定的特性，如乳房X线摄影中的多视图关系。与一起分析两个视图以处理同侧对应的放射科医生不同，当前方法将它们视为独立图像或不正确地对多视图对应学习进行建模，从而丢失关键几何上下文并导致次优预测。我们提出了GLAM：使用几何指导进行VLM预训练的多视图乳房X光检查的全局和局部对齐。通过利用关于乳房X线照片的多视图成像过程的先验知识，我们的模型通过联合全局和局部、视觉-视觉和视觉-语言对比学习来学习局部跨视图对齐和细粒度局部特征。在EMBED [14]上进行了预训练，EMBED是最大的开放乳腺X射线摄影数据集之一，我们的模型在不同设置下的多个数据集上的表现优于基线。
摘要：Mammography screening is an essential tool for early detection of breast cancer. The speed and accuracy of mammography interpretation have the potential to be improved with deep learning methods. However, the development of a foundation visual language model (VLM) is hindered by limited data and domain differences between natural and medical images. Existing mammography VLMs, adapted from natural images, often ignore domain-specific characteristics, such as multi-view relationships in mammography. Unlike radiologists who analyze both views together to process ipsilateral correspondence, current methods treat them as independent images or do not properly model the multi-view correspondence learning, losing critical geometric context and resulting in suboptimal prediction. We propose GLAM: Global and Local Alignment for Multi-view mammography for VLM pretraining using geometry guidance. By leveraging the prior knowledge about the multi-view imaging process of mammograms, our model learns local cross-view alignments and fine-grained local features through joint global and local, visual-visual, and visual-language contrastive learning. Pretrained on EMBED [14], one of the largest open mammography datasets, our model outperforms baselines across multiple datasets under different settings.

【7】Proof of AutoML: SDN based Secure Energy Trading with Blockchain in Disaster Case
标题：AutoML证明：灾难案例中基于SDK的区块链安全能源交易
链接：https://arxiv.org/abs/2509.10291

作者：rak, Muge Erel-Ozcevik
备注：6 pages, 3 figures, 7th International Conference on Blockchain Computing and Applications (BCCA 2025), \c{opyright}2025 IEEE
摘要：在传统能源基础设施受到损害的灾难情况下，太阳能家庭和移动充电设备之间的安全和可追溯的能源交易成为必要。为了确保区块链网络上此类交易的完整性，健壮且不可预测的随机数生成至关重要。这项研究提出了一种支持SDN的架构，其中利用机器学习回归器不是为了它们的准确性，而是为了它们生成适合作为随机数候选者的随机值的潜力。因此，它被称为AutoML的证明。在这里，SDN允许灵活控制数据流和能源路由策略，即使在碎片化或降级的网络中，确保在紧急情况下的自适应响应。使用9000个样本数据集，我们评估了五个AutoML选择的回归模型-梯度提升，LightGBM，随机森林，额外树和K最近邻-不是通过它们的预测准确性，而是通过它们在混洗数据输入中产生多样化和非确定性输出的能力。随机性分析显示，随机森林和额外树回归量表现出对随机性的完全依赖性，而梯度提升，K最近邻和LightGBM显示出较强但略低的随机性得分（分别为97.6%，98.8%和99.9%）。这些发现强调了某些机器学习模型，特别是基于树的集合，可以在区块链安全的，基于SDN的能源交易基础设施中作为有效和轻量级的随机数生成器，以适应灾难条件。
摘要：In disaster scenarios where conventional energy infrastructure is compromised, secure and traceable energy trading between solar-powered households and mobile charging units becomes a necessity. To ensure the integrity of such transactions over a blockchain network, robust and unpredictable nonce generation is vital. This study proposes an SDN-enabled architecture where machine learning regressors are leveraged not for their accuracy, but for their potential to generate randomized values suitable as nonce candidates. Therefore, it is newly called Proof of AutoML. Here, SDN allows flexible control over data flows and energy routing policies even in fragmented or degraded networks, ensuring adaptive response during emergencies. Using a 9000-sample dataset, we evaluate five AutoML-selected regression models - Gradient Boosting, LightGBM, Random Forest, Extra Trees, and K-Nearest Neighbors - not by their prediction accuracy, but by their ability to produce diverse and non-deterministic outputs across shuffled data inputs. Randomness analysis reveals that Random Forest and Extra Trees regressors exhibit complete dependency on randomness, whereas Gradient Boosting, K-Nearest Neighbors and LightGBM show strong but slightly lower randomness scores (97.6%, 98.8% and 99.9%, respectively). These findings highlight that certain machine learning models, particularly tree-based ensembles, may serve as effective and lightweight nonce generators within blockchain-secured, SDN-based energy trading infrastructures resilient to disaster conditions.

【8】Targeted Test Selection Approach in Continuous Integration
标题：持续集成中的目标测试选择方法
链接：https://arxiv.org/abs/2509.10279

作者：usnin, Aleksey Antonov, Vasilii Ermakov, Aleksandr Khaybriev, Margarita Kikot, Ilseyar Alimova, Stanislav Moiseev
备注：Accepted at ICSME 2025
摘要：在现代软件开发中，基于变更的测试起着至关重要的作用。然而，随着代码库的扩展和测试套件的增长，有效地管理测试过程变得越来越具有挑战性，特别是考虑到日常代码提交的高频率。我们提出了有针对性的测试选择（T-TS），工业测试选择的机器学习方法。我们的关键创新是将提交表示为已更改文件的词袋的数据表示，结合了跨文件和其他预测功能，并特别避免了使用覆盖图。T-TS部署在生产环境中后，根据行业标准和最新方法进行了全面评估，使用内部和公共数据集，测量时间效率和故障检测。在实时工业数据上，T-TS仅选择15%的测试，将执行时间缩短了5.9\times $，将流水线加速了5.6\times $，并检测到超过95%的测试失败。该实施是公开的，以支持进一步的研究和实际采用。
摘要：In modern software development change-based testing plays a crucial role. However, as codebases expand and test suites grow, efficiently managing the testing process becomes increasingly challenging, especially given the high frequency of daily code commits. We propose Targeted Test Selection (T-TS), a machine learning approach for industrial test selection. Our key innovation is a data representation that represent commits as Bags-of-Words of changed files, incorporates cross-file and additional predictive features, and notably avoids the use of coverage maps. Deployed in production, T-TS was comprehensively evaluated against industry standards and recent methods using both internal and public datasets, measuring time efficiency and fault detection. On live industrial data, T-TS selects only 15% of tests, reduces execution time by $5.9\times$, accelerates the pipeline by $5.6\times$, and detects over 95% of test failures. The implementation is publicly available to support further research and practical adoption.

【9】RFSeek and Ye Shall Find
标题：RFSeek和Ye Shall Find
链接：https://arxiv.org/abs/2509.10216

作者：otman, Tiago Ferreira, Hila Peleg, Mark Silberstein, Alexandra Silva
备注：7 pages
摘要：RFC（Request for Comments）是网络协议的广泛规范文档，但其基于散文的格式和相当长的长度通常会妨碍准确的操作理解。我们提出RFSeek，一个交互式的工具，自动提取可视化摘要的协议逻辑RFC。RFSeek利用大型语言模型（LLM）来生成与出处相关的可探索图，展示官方状态机和仅在RFC文本中找到的附加逻辑。与现有的RFC可视化相比，RFSeek的可视化摘要更透明，更容易根据其文本源进行审计。我们通过一系列用例展示了该工具的潜力，包括引导知识提取和语义区分，应用于TCP，QUIC，PPTP和DCCP等协议。在实践中，RFSeek不仅重建了一些规范中包含的RFC图，而且更有趣的是，还发现了文本中描述但在这些图中缺失的重要逻辑，如节点或边。RFSeek进一步推导出复杂RFC的新可视化图，QUIC作为代表案例。我们的方法，我们称之为摘要可视化，突出了一个有前途的方向：将LLM与正式的，用户自定义的可视化相结合，以增强协议的理解和支持强大的实现。
摘要：Requests for Comments (RFCs) are extensive specification documents for network protocols, but their prose-based format and their considerable length often impede precise operational understanding. We present RFSeek, an interactive tool that automatically extracts visual summaries of protocol logic from RFCs. RFSeek leverages large language models (LLMs) to generate provenance-linked, explorable diagrams, surfacing both official state machines and additional logic found only in the RFC text. Compared to existing RFC visualizations, RFSeek's visual summaries are more transparent and easier to audit against their textual source. We showcase the tool's potential through a series of use cases, including guided knowledge extraction and semantic diffing, applied to protocols such as TCP, QUIC, PPTP, and DCCP. In practice, RFSeek not only reconstructs the RFC diagrams included in some specifications, but, more interestingly, also uncovers important logic such as nodes or edges described in the text but missing from those diagrams. RFSeek further derives new visualization diagrams for complex RFCs, with QUIC as a representative case. Our approach, which we term \emph{Summary Visualization}, highlights a promising direction: combining LLMs with formal, user-customized visualizations to enhance protocol comprehension and support robust implementations.

【10】The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagrams
标题：深度ResNet的隐藏宽度：紧误差界限和相图
链接：https://arxiv.org/abs/2509.10167

作者：izat
摘要：We study the gradient-based training of large-depth residual networks (ResNets) from standard random initializations. We show that with a diverging depth $L$, a fixed embedding dimension $D$, and an arbitrary hidden width $M$, the training dynamics converges to a Neural Mean ODE training dynamics. Remarkably, the limit is independent of the scaling of $M$, covering practical cases of, say, Transformers, where $M$ (the number of hidden units or attention heads per layer) is typically of the order of $D$. For a residual scale $\Theta_D\big(\frac{\alpha}{LM}\big)$, we obtain the error bound $O_D\big(\frac{1}{L}+ \frac{\alpha}{\sqrt{LM}}\big)$ between the model's output and its limit after a fixed number gradient of steps, and we verify empirically that this rate is tight. When $\alpha=\Theta(1)$, the limit exhibits complete feature learning, i.e. the Mean ODE is genuinely non-linearly parameterized. In contrast, we show that $\alpha \to \infty$ yields a \lazy ODE regime where the Mean ODE is linearly parameterized. We then focus on the particular case of ResNets with two-layer perceptron blocks, for which we study how these scalings depend on the embedding dimension $D$. We show that for this model, the only residual scale that leads to complete feature learning is $\Theta\big(\frac{\sqrt{D}}{LM}\big)$. In this regime, we prove the error bound $O\big(\frac{1}{L}+ \frac{\sqrt{D}}{\sqrt{LM}}\big)$ between the ResNet and its limit after a fixed number of gradient steps, which is also empirically tight. Our convergence results rely on a novel mathematical perspective on ResNets : (i) due to the randomness of the initialization, the forward and backward pass through the ResNet behave as the stochastic approximation of certain mean ODEs, and (ii) by propagation of chaos (that is, asymptotic independence of the units) this behavior is preserved through the training dynamics.
摘要：We study the gradient-based training of large-depth residual networks (ResNets) from standard random initializations. We show that with a diverging depth $L$, a fixed embedding dimension $D$, and an arbitrary hidden width $M$, the training dynamics converges to a Neural Mean ODE training dynamics. Remarkably, the limit is independent of the scaling of $M$, covering practical cases of, say, Transformers, where $M$ (the number of hidden units or attention heads per layer) is typically of the order of $D$. For a residual scale $\Theta_D\big(\frac{\alpha}{LM}\big)$, we obtain the error bound $O_D\big(\frac{1}{L}+ \frac{\alpha}{\sqrt{LM}}\big)$ between the model's output and its limit after a fixed number gradient of steps, and we verify empirically that this rate is tight. When $\alpha=\Theta(1)$, the limit exhibits complete feature learning, i.e. the Mean ODE is genuinely non-linearly parameterized. In contrast, we show that $\alpha \to \infty$ yields a \lazy ODE regime where the Mean ODE is linearly parameterized. We then focus on the particular case of ResNets with two-layer perceptron blocks, for which we study how these scalings depend on the embedding dimension $D$. We show that for this model, the only residual scale that leads to complete feature learning is $\Theta\big(\frac{\sqrt{D}}{LM}\big)$. In this regime, we prove the error bound $O\big(\frac{1}{L}+ \frac{\sqrt{D}}{\sqrt{LM}}\big)$ between the ResNet and its limit after a fixed number of gradient steps, which is also empirically tight. Our convergence results rely on a novel mathematical perspective on ResNets : (i) due to the randomness of the initialization, the forward and backward pass through the ResNet behave as the stochastic approximation of certain mean ODEs, and (ii) by propagation of chaos (that is, asymptotic independence of the units) this behavior is preserved through the training dynamics.

【11】A Symmetry-Integrated Approach to Surface Code Decoding
标题：表面代码解码的对称集成方法
链接：https://arxiv.org/abs/2509.10164

作者： Ohnishi, Hideo Mukai
备注：12 pages, 6 figures
摘要：量子纠错是利用逻辑量子比特编码为冗余的多个物理量子比特来发现和纠正物理量子比特中的错误，是实际量子计算不可或缺的。表面编码被认为是一种有前途的编码方法，具有由稳定器生成器定义的高错误阈值。然而，以前的方法已经遭受的问题，解码器获得唯一的错误概率分布，因为从输入获得的正确预测的非唯一性。为了避免这个问题，我们提出了一种技术，通过近似症状测量与神经网络数学插值的连续函数，重新优化解码器模型。我们评估了基于多层感知器的解码器的准确性的提高，用于5和7的代码距离，以及基于卷积和递归神经网络和Transformers的解码器，用于5的代码距离。在所有情况下，重新优化的解码器给出了更好的准确性比原来的模型，证明了所提出的方法是独立的代码距离或网络架构的普遍有效性。这些结果表明，将表面代码解码问题重新构建为可以通过深度学习解决的回归问题是一种有用的策略。
摘要：Quantum error correction, which utilizes logical qubits that are encoded as redundant multiple physical qubits to find and correct errors in physical qubits, is indispensable for practical quantum computing. Surface code is considered to be a promising encoding method with a high error threshold that is defined by stabilizer generators. However, previous methods have suffered from the problem that the decoder acquires solely the error probability distribution because of the non-uniqueness of correct prediction obtained from the input. To circumvent this problem, we propose a technique to reoptimize the decoder model by approximating syndrome measurements with a continuous function that is mathematically interpolated by neural network. We evaluated the improvement in accuracy of a multilayer perceptron based decoder for code distances of 5 and 7 as well as for decoders based on convolutional and recurrent neural networks and transformers for a code distance of 5. In all cases, the reoptimized decoder gave better accuracy than the original models, demonstrating the universal effectiveness of the proposed method that is independent of code distance or network architecture. These results suggest that re-framing the problem of surface code decoding into a regression problem that can be tackled by deep learning is a useful strategy.

【12】Neural Scaling Laws for Deep Regression
标题：深度回归的神经缩放定律
链接：https://arxiv.org/abs/2509.10000

作者：ez, Kyoung-Min Kim
备注：Supplementary Information will be provided with the published manuscript
摘要：神经标度律--泛化误差与深度学习模型特征之间的幂律关系--是在管理有限资源的同时开发可靠模型的重要工具。尽管大型语言模型的成功凸显了这些定律的重要性，但它们在深度回归模型中的应用在很大程度上仍未得到探索。在这里，我们经验性地研究神经标度律在深度回归中使用扭曲范德华磁体的参数估计模型。我们观察到损失与训练数据集大小和模型容量之间的幂律关系，这些数据集大小和模型容量在很大范围内，采用各种架构-包括完全连接的网络，残差网络和Vision Transformers。此外，控制这些关系的标度指数范围从1到2，具体值取决于回归参数和模型细节。一致的标度行为及其大的标度指数表明，深度回归模型的性能可以随着数据大小的增加而大幅提高。
摘要：Neural scaling laws--power-law relationships between generalization errors and characteristics of deep learning models--are vital tools for developing reliable models while managing limited resources. Although the success of large language models highlights the importance of these laws, their application to deep regression models remains largely unexplored. Here, we empirically investigate neural scaling laws in deep regression using a parameter estimation model for twisted van der Waals magnets. We observe power-law relationships between the loss and both training dataset size and model capacity across a wide range of values, employing various architectures--including fully connected networks, residual networks, and vision transformers. Furthermore, the scaling exponents governing these relationships range from 1 to 2, with specific values depending on the regressed parameters and model details. The consistent scaling behaviors and their large scaling exponents suggest that the performance of deep regression models can improve substantially with increasing data size.

【13】SciML Agents: Write the Solver, Not the Solution
标题：SciML代理：编写求解器，而不是解决方案
链接：https://arxiv.org/abs/2509.09936

作者：onkar, Xiang Zheng, Haocheng Xi, Rishabh Tiwari, Kurt Keutzer, Dmitriy Morozov, Michael W. Mahoney, Amir Gholami
摘要：科学机器学习的最新工作旨在通过用神经网络预测目标值来直接处理科学任务（例如，物理信息神经网络、神经ODE、神经算子等），但是获得高精度和鲁棒性是具有挑战性的。我们探索另一种观点：使用LLM来编写利用数十年数值算法的代码。这将学习解决方案函数的负担转移到了进行域感知的数值选择。我们问LLM是否可以充当SciML代理，给定自然语言ODE描述，生成科学上合适的可运行代码，选择合适的求解器（刚性与非刚性），并执行稳定性检查。目前还没有基准来衡量这种科学计算任务的能力。因此，我们首先介绍两个新的数据集：对抗性“误导”问题的诊断数据集;以及1,000个不同ODE任务的大规模基准。诊断集包含的问题，其表面的外观表明刚度，并需要代数简化，以证明非刚度和大规模的基准跨越刚性和非刚性ODE制度。我们沿着两个轴评估开源和闭源LLM模型：（i）无指导与有指导的提示与特定领域的知识;（ii）现成的与微调的变体。我们的评估措施的可执行性和数值有效性对参考解决方案。我们发现，有足够的背景和指导提示，较新的预防以下模型实现了高精度的两个标准。在许多情况下，最近的开源系统在没有微调的情况下表现强劲，而旧的或较小的模型仍然受益于微调。总体而言，我们的初步结果表明，仔细提示和微调可以产生一个专门的LLM代理能够可靠地解决简单的ODE问题。
摘要：Recent work in scientific machine learning aims to tackle scientific tasks directly by predicting target values with neural networks (e.g., physics-informed neural networks, neural ODEs, neural operators, etc.), but attaining high accuracy and robustness has been challenging. We explore an alternative view: use LLMs to write code that leverages decades of numerical algorithms. This shifts the burden from learning a solution function to making domain-aware numerical choices. We ask whether LLMs can act as SciML agents that, given a natural-language ODE description, generate runnable code that is scientifically appropriate, selecting suitable solvers (stiff vs. non-stiff), and enforcing stability checks. There is currently no benchmark to measure this kind of capability for scientific computing tasks. As such, we first introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks. The diagnostic set contains problems whose superficial appearance suggests stiffness, and that require algebraic simplification to demonstrate non-stiffness; and the large-scale benchmark spans stiff and non-stiff ODE regimes. We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants. Our evaluation measures both executability and numerical validity against reference solutions. We find that with sufficient context and guided prompts, newer instruction-following models achieve high accuracy on both criteria. In many cases, recent open-source systems perform strongly without fine-tuning, while older or smaller models still benefit from fine-tuning. Overall, our preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.

【14】Multi-Play Combinatorial Semi-Bandit Problem
标题：多局组合半强盗问题
链接：https://arxiv.org/abs/2509.09933

作者：Nakamura, Yuko Kuroki, Wei Chen
摘要：在组合半强盗（CSB）问题中，参与者从组合动作集中选择一个动作，并观察该动作中包含的基臂的反馈。虽然CSB广泛适用于组合优化问题，它的限制，二进制决策空间排除了重要的情况下，涉及非负整数流或分配，如最优运输和背包问题，为了克服这一限制，我们提出了多局组合半土匪（MP-CSB），其中一个球员可以选择一个非负整数行动，并观察多个反馈从一个单一的手臂在每一轮。我们提出了两个算法的MP-CSB。一个是基于采样的算法，即使当动作空间相对于手臂的数量呈指数级大时，该算法在计算上也是可行的，并且在随机机制中获得了O（\log T）$分布依赖的后悔，其中$T$是时间范围。另一种是两个世界中最好的算法，它在随机机制中实现了$O（\log T）$方差依赖的后悔，在对抗机制中实现了最坏情况下的$\tilde{\mathcal{O}}\left（\sqrt{T} \right）$后悔。此外，它的后悔在对抗一个是数据依赖的，适应于最佳行动的累积损失，总的二次变异，和路径长度的损失序列。最后，我们数值计算表明，所提出的算法优于现有的方法在CSB文献。
摘要：In the combinatorial semi-bandit (CSB) problem, a player selects an action from a combinatorial action set and observes feedback from the base arms included in the action. While CSB is widely applicable to combinatorial optimization problems, its restriction to binary decision spaces excludes important cases involving non-negative integer flows or allocations, such as the optimal transport and knapsack problems.To overcome this limitation, we propose the multi-play combinatorial semi-bandit (MP-CSB), where a player can select a non-negative integer action and observe multiple feedbacks from a single arm in each round. We propose two algorithms for the MP-CSB. One is a Thompson-sampling-based algorithm that is computationally feasible even when the action space is exponentially large with respect to the number of arms, and attains $O(\log T)$ distribution-dependent regret in the stochastic regime, where $T$ is the time horizon. The other is a best-of-both-worlds algorithm, which achieves $O(\log T)$ variance-dependent regret in the stochastic regime and the worst-case $\tilde{\mathcal{O}}\left( \sqrt{T} \right)$ regret in the adversarial regime. Moreover, its regret in adversarial one is data-dependent, adapting to the cumulative loss of the optimal action, the total quadratic variation, and the path-length of the loss sequence. Finally, we numerically show that the proposed algorithms outperform existing methods in the CSB literature.

【15】Latency and Token-Aware Test-Time Compute
标题：延迟和令牌感知测试时计算
链接：https://arxiv.org/abs/2509.09864

作者：Huang, Mehul Damani, Yousef El-Kurdi, Ramon Astudillo, Wei Sun
摘要：推理时间缩放已经成为一种通过生成多个候选响应并在其中进行选择来提高大型语言模型（LLM）性能的强大方法。然而，现有的动态分配测试时间计算的工作通常只考虑并行生成方法，如最好的N，忽略增量解码方法，如波束搜索，并在很大程度上忽略了延迟，只关注令牌的使用。我们制定推理时间缩放作为一个问题的动态计算分配和方法选择，系统必须决定应用哪种策略和多少计算分配每个查询的基础上。我们的框架明确地结合了令牌成本和挂钟延迟，后者对于用户体验至关重要，特别是对于模型必须有效地发出多个查询的代理工作流。推理基准的实验表明，我们的方法始终优于静态策略，实现了有利的准确性和成本的权衡，同时保持实用的部署。
摘要：Inference-time scaling has emerged as a powerful way to improve large language model (LLM) performance by generating multiple candidate responses and selecting among them. However, existing work on dynamic allocation for test-time compute typically considers only parallel generation methods such as best-of-N, overlooking incremental decoding methods like beam search, and has largely ignored latency, focusing only on token usage. We formulate inference-time scaling as a problem of dynamic compute allocation and method selection, where the system must decide which strategy to apply and how much compute to allocate on a per-query basis. Our framework explicitly incorporates both token cost and wall-clock latency, the latter being critical for user experience and particularly for agentic workflows where models must issue multiple queries efficiently. Experiments on reasoning benchmarks show that our approach consistently outperforms static strategies, achieving favorable accuracy-cost trade-offs while remaining practical for deployment.

【16】Distinguishing Startle from Surprise Events Based on Physiological Signals
标题：基于生理信号区分惊吓与意外事件
链接：https://arxiv.org/abs/2509.09799

作者：rma, Alexandre Duchevet, Florian Daiber, Jean-Paul Imbert, Maurice Rekrut
摘要：意外事件可能会影响注意力并延迟决策，在航空等高风险环境中构成严重的安全风险。特别是，像惊吓和惊讶这样的反应会以不同的方式影响飞行员的表现，但在实践中往往很难区分。现有的研究在很大程度上分别研究了这些反应，对它们的综合效应或如何使用生理数据区分它们的关注有限。在这项工作中，我们通过使用机器学习和多模态融合策略基于生理信号区分惊吓和惊喜事件来解决这一差距。我们的研究结果表明，这些事件可以可靠地预测，使用SVM和后期融合实现了85.7%的最高平均准确率。为了进一步验证我们模型的鲁棒性，我们扩展了评估以包括基线条件，成功区分了惊吓，惊喜和基线状态，使用XGBoost和Late Fusion的平均准确率最高为74.9%。
摘要：Unexpected events can impair attention and delay decision-making, posing serious safety risks in high-risk environments such as aviation. In particular, reactions like startle and surprise can impact pilot performance in different ways, yet are often hard to distinguish in practice. Existing research has largely studied these reactions separately, with limited focus on their combined effects or how to differentiate them using physiological data. In this work, we address this gap by distinguishing between startle and surprise events based on physiological signals using machine learning and multi-modal fusion strategies. Our results demonstrate that these events can be reliably predicted, achieving a highest mean accuracy of 85.7% with SVM and Late Fusion. To further validate the robustness of our model, we extended the evaluation to include a baseline condition, successfully differentiating between Startle, Surprise, and Baseline states with a highest mean accuracy of 74.9% with XGBoost and Late Fusion.

【17】From the Gradient-Step Denoiser to the Proximal Denoiser and their associated convergent Plug-and-Play algorithms
标题：从近步降噪器到近端降噪器及其相关的收敛即插即用算法
链接：https://arxiv.org/abs/2509.09793

作者：erfeld, Baudouin Denis de Senneville, Arthur Leclaire, Nicolas Papadakis
摘要：在本文中，我们分析了连续步骤去噪和它的使用即插即用算法。优化算法的即插即用范例使用现成的去噪器来代替图像先验的邻近算子或梯度下降算子。通常，这种图像先验是隐式的，无法表达，但连续步骤去噪器被训练成精确的梯度下降算子或显式泛函的邻近算子，同时保留最先进的去噪能力。
摘要：In this paper we analyze the Gradient-Step Denoiser and its usage in Plug-and-Play algorithms. The Plug-and-Play paradigm of optimization algorithms uses off the shelf denoisers to replace a proximity operator or a gradient descent operator of an image prior. Usually this image prior is implicit and cannot be expressed, but the Gradient-Step Denoiser is trained to be exactly the gradient descent operator or the proximity operator of an explicit functional while preserving state-of-the-art denoising capabilities.

【18】LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
标题：LAWa：具有动态预算分配的逐层KV缓存驱逐
链接：https://arxiv.org/abs/2509.09754

作者：n, Song Yuan, Zhengze Zhang, Xiaoliang Wang, Daxin Jiang, Nguyen Cam-Tu
摘要：KV Cache通常用于加速长上下文的LLM推理，但其高内存需求推动了对缓存压缩的需求。然而，现有的压缩方法在很大程度上是启发式的，并且缺乏动态预算分配。为了解决这一限制，我们引入了一个统一的框架，通过最大限度地减少Transformer剩余流中的信息丢失来压缩缓存。在此基础上，我们分析了层注意力输出损失，并推导出一个新的指标来比较头上的缓存条目，从而实现动态头预算的逐层压缩。此外，通过对比跨层信息，我们还实现了动态层预算。LAVa是第一个用于缓存回收和动态预算分配的统一策略，与以前的方法不同，它不依赖于训练或多种策略的组合。在LongBench、Needle-In-A-Haystack、Ruler和InfiniteBench上的实验证明了该方法的优越性。此外，我们的实验揭示了一个新的见解：动态层预算对于生成任务至关重要（例如，代码完成），而动态磁头预算在提取任务中起关键作用（例如，提取QA）。作为一种完全动态的压缩方法，LAVa在各种任务类型中始终保持最佳性能。我们的代码可在https://github.com/MGDDestiny/Lava上获得。
摘要：KV Cache is commonly used to accelerate LLM inference with long contexts, yet its high memory demand drives the need for cache compression. Existing compression methods, however, are largely heuristic and lack dynamic budget allocation. To address this limitation, we introduce a unified framework for cache compression by minimizing information loss in Transformer residual streams. Building on it, we analyze the layer attention output loss and derive a new metric to compare cache entries across heads, enabling layer-wise compression with dynamic head budgets. Additionally, by contrasting cross-layer information, we also achieve dynamic layer budgets. LAVa is the first unified strategy for cache eviction and dynamic budget allocation that, unlike prior methods, does not rely on training or the combination of multiple strategies. Experiments with benchmarks (LongBench, Needle-In-A-Haystack, Ruler, and InfiniteBench) demonstrate its superiority. Moreover, our experiments reveal a new insight: dynamic layer budgets are crucial for generation tasks (e.g., code completion), while dynamic head budgets play a key role in extraction tasks (e.g., extractive QA). As a fully dynamic compression method, LAVa consistently maintains top performance across task types. Our code is available at https://github.com/MGDDestiny/Lava.

【19】MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
标题：MCP-AgentBench：使用MVP介导的工具评估现实世界的语言代理性能
链接：https://arxiv.org/abs/2509.09734

作者：o, Benfeng Xu, Chiwei Zhu, Wentao Hong, Xiaorui Wang, Zhendong Mao
摘要：模型上下文协议（MCP）正在迅速成为一个关键的开放标准，旨在增强代理工具的集成和互操作性，并定位于开启一个强大，互连和真正实用的代理AI的新时代。然而，尽管MCP的采用越来越多，现有的基准往往无法捕捉真实世界的代理性能在这个新的范式，导致其真正的运营价值的扭曲的看法和无法可靠地区分利润。为了弥合这一关键的评估差距，我们引入了MCP-AgentBench -一个全面的基准测试，专门用于严格评估语言代理在MCP介导的工具交互中的能力。MCP-AgentBench的核心贡献包括：建立了一个强大的MCP测试平台，包括33个操作服务器和188个不同的工具;开发了一个基准测试，其中包括600个系统设计的查询，分布在6个不同类别的不同交互复杂性;并引入了MCP-Eval，一种新的以结果为导向的评估方法，优先考虑现实世界的任务成功。通过对领先语言代理的广泛实证评估，我们提供了基本的见解。MCP-AgentBench旨在为研究界提供一个标准化和可靠的框架，以构建、验证和推进能够充分利用MCP变革性优势的代理，从而加速实现真正有能力和可互操作的人工智能系统。
摘要：The Model Context Protocol (MCP) is rapidly emerging as a pivotal open standard, designed to enhance agent-tool integration and interoperability, and is positioned to unlock a new era of powerful, interconnected, and genuinely utilitarian agentic AI. However, despite MCP's growing adoption, existing benchmarks often fail to capture real-world agent performance within this new paradigm, leading to a distorted perception of their true operational value and an inability to reliably differentiate proficiencies. To bridge this critical evaluation gap, we introduce MCP-AgentBench -- a comprehensive benchmark specifically engineered to rigorously assess language agent capabilities in MCP-mediated tool interactions. Core contributions of MCP-AgentBench include: the establishment of a robust MCP testbed comprising 33 operational servers with 188 distinct tools; the development of a benchmark featuring 600 systematically designed queries distributed across 6 distinct categories of varying interaction complexity; and the introduction of MCP-Eval, a novel outcome-oriented evaluation methodology prioritizing real-world task success. Through extensive empirical evaluation of leading language agents, we provide foundational insights. MCP-AgentBench aims to equip the research community with a standardized and reliable framework to build, validate, and advance agents capable of fully leveraging MCP's transformative benefits, thereby accelerating progress toward truly capable and interoperable AI systems.

【20】Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL
标题：公平地剪辑您的序列：强制序列级RL的长度公平
链接：https://arxiv.org/abs/2509.09177

作者：, Quanjia Xiao, Lei Pang, Haixiao Liu
摘要：我们提出了FSPO（公平序列策略优化），这是一种用于LLM的序列级强化学习方法，它直接在重要性采样（IS）权重空间中执行长度公平裁剪。我们重新审视了序列级RL方法，并确定了PPO/GRPO风格的裁剪移植到序列时的不匹配：固定的裁剪范围系统地重新加权短响应与长响应，扭曲了有效的目标。从理论上讲，我们通过长度重新加权误差（LRE）形式化长度公平性，并证明了小LRE产生一个方向余弦保证之间的裁剪和真正的更新。FSPO引入了一个简单的，高斯动机的补救措施：我们剪辑的序列对数IS比与一个波段，适用于KL校正漂移项和规模为$\sqrt{L}$。从经验上讲，FSPO可以跨长度箱调整剪辑速率，稳定训练，并在多个评估数据集上优于所有基线。
摘要：We propose FSPO (Fair Sequence Policy Optimization), a sequence-level reinforcement learning method for LLMs that enforces length-fair clipping directly in the importance-sampling (IS) weight space. We revisit sequence-level RL methods and identify a mismatch when PPO/GRPO-style clipping is transplanted to sequences: a fixed clip range systematically reweights short vs. long responses, distorting the effective objective. Theoretically, we formalize length fairness via a Length Reweighting Error (LRE) and prove that small LRE yields a directional cosine guarantee between the clipped and true updates. FSPO introduces a simple, Gaussian-motivated remedy: we clip the sequence log-IS ratio with a band that applies a KL-corrected drift term and scales as $\sqrt{L}$. Empirically, FSPO flattens clip rates across length bins, stabilizes training, and outperforms all baselines across multiple evaluation datasets.

【21】Differentially Private Decentralized Dataset Synthesis Through Randomized Mixing with Correlated Noise
标题：通过随机混合与相关噪音进行差异化私有分散数据集合成
链接：https://arxiv.org/abs/2509.10385

作者：a, Tanvir Muntakim Tonoy, Hafiz Imtiaz
备注：This work has been submitted to the IEEE for possible publication
摘要：在这项工作中，我们探讨了差异化的私人合成数据生成在分散的数据设置的基础上，最近提出的差异化私人类为中心的数据聚合（DP-CDA）。DP-CDA通过混合来自同一类的多个随机选择的样本并注入仔细校准的高斯噪声，在集中设置中合成数据，以确保（{\displaystyle {\delta}）差分隐私。当部署在分散或联合的环境中时，每个客户端只持有一小部分数据，DP-CDA面临着新的挑战。每个客户端有限的样本大小增加了本地计算的敏感性，需要更高的噪声注入来维持差分隐私保证。与集中式设置相比，这反过来又导致效用明显下降。为了解决这个问题，我们将相关辅助私有估计（CAPE）协议集成到联邦DP-CDA框架中，提出了CAPE辅助的联邦DP-CDA算法。CAPE通过允许客户端生成联合分布（反相关）噪声来实现客户端之间的有限协作，该噪声在聚合中被抵消，同时在个人层面上保护隐私。这种技术显著改善了联邦设置中的隐私-效用权衡。在MNIST和FashionMNIST数据集上进行的大量实验表明，所提出的CAPE辅助联邦DP-CDA方法在某些参数制度下可以实现与其集中式对应物相当的效用，同时保持严格的差分隐私保证。
摘要：In this work, we explore differentially private synthetic data generation in a decentralized-data setting by building on the recently proposed Differentially Private Class-Centric Data Aggregation (DP-CDA). DP-CDA synthesizes data in a centralized setting by mixing multiple randomly-selected samples from the same class and injecting carefully calibrated Gaussian noise, ensuring ({\epsilon}, {\delta})-differential privacy. When deployed in a decentralized or federated setting, where each client holds only a small partition of the data, DP-CDA faces new challenges. The limited sample size per client increases the sensitivity of local computations, requiring higher noise injection to maintain the differential privacy guarantee. This, in turn, leads to a noticeable degradation in the utility compared to the centralized setting. To mitigate this issue, we integrate the Correlation-Assisted Private Estimation (CAPE) protocol into the federated DP-CDA framework and propose CAPE Assisted Federated DP-CDA algorithm. CAPE enables limited collaboration among the clients by allowing them to generate jointly distributed (anti-correlated) noise that cancels out in aggregate, while preserving privacy at the individual level. This technique significantly improves the privacy-utility trade-off in the federated setting. Extensive experiments on MNIST and FashionMNIST datasets demonstrate that the proposed CAPE Assisted Federated DP-CDA approach can achieve utility comparable to its centralized counterpart under some parameter regime, while maintaining rigorous differential privacy guarantees.

【22】Matrix-free Neural Preconditioner for the Dirac Operator in Lattice Gauge Theory
标题：格点规范理论中Dirac算子的无矩阵神经预条件子
链接：https://arxiv.org/abs/2509.10378

作者：n, Srinivas Eswar, Yin Lin, William Detmold, Phiala Shanahan, Xiaoye Li, Yang Liu, Prasanna Balaprakash
摘要：在格点量子色动力学~（QCD）中，在产生样本和计算观测量的过程中产生了线性系统.求解稀疏但病态的Hermitian正定系统涉及使用迭代方法，例如共轭梯度（CG），这是耗时且计算昂贵的。预处理器可以有效地加速这一过程，最先进的是多网格预处理器。然而，构造有用的预条件子可能具有挑战性，增加额外的计算开销，特别是在大型线性系统中。我们提出了一个框架，利用算子学习技术，构建线性映射作为有效的预条件。在这项工作中的方法不依赖于显式矩阵，无论是从原始的线性系统或产生的预处理器，允许有效的模型训练和CG求解器中的应用。在Schwinger模型U（1）规范理论的1+1时空维的两个简并质量费米子）的背景下，这种预处理方案有效地减少了线性系统的条件数，并且在相关参数范围内收敛所需的迭代次数约为一半。我们进一步证明了该框架学习依赖于晶格结构的一般映射，这导致从不同大小的规范场配置构造的狄拉克算子的zero-shot学习能力。
摘要：Linear systems arise in generating samples and in calculating observables in lattice quantum chromodynamics~(QCD). Solving the Hermitian positive definite systems, which are sparse but ill-conditioned, involves using iterative methods, such as Conjugate Gradient (CG), which are time-consuming and computationally expensive. Preconditioners can effectively accelerate this process, with the state-of-the-art being multigrid preconditioners. However, constructing useful preconditioners can be challenging, adding additional computational overhead, especially in large linear systems. We propose a framework, leveraging operator learning techniques, to construct linear maps as effective preconditioners. The method in this work does not rely on explicit matrices from either the original linear systems or the produced preconditioners, allowing efficient model training and application in the CG solver. In the context of the Schwinger model U(1) gauge theory in 1+1 spacetime dimensions with two degenerate-mass fermions), this preconditioning scheme effectively decreases the condition number of the linear systems and approximately halves the number of iterations required for convergence in relevant parameter ranges. We further demonstrate the framework learns a general mapping dependent on the lattice structure which leads to zero-shot learning ability for the Dirac operators constructed from gauge field configurations of different sizes.

【23】Repulsive Monte Carlo on the sphere for the sliced Wasserstein distance
标题：令人厌恶的蒙特卡洛在球体上切割沃瑟斯坦距离
链接：https://arxiv.org/abs/2509.10166

作者：Petrovic, Rémi Bardenet, Agnès Desolneux
摘要：在本文中，我们考虑的问题，计算积分的一个功能上的单位球，在任何维，使用Monte Carlo方法。虽然我们提出的方法是通用的，我们的指导思想是$\mathbb {R}^d $上两个测度之间的切片Wasserstein距离，这是$d $维球面上的积分。切片Wasserstein距离（SW）在机器学习中获得了动力，无论是作为计算上不太容易处理的Wasserstein距离的代理，还是作为本身的距离，特别是由于其内置的维度灾难缓解。最近有数值基准的切片Wasserstein的求积，我们的观点不同，我们集中在求积的节点是排斥的，即负相关。事实上，负相关性可以带来方差减少时，正交是适应积分任务。我们的第一个贡献是从最近的文献中提取和激励行列式点过程（DPP）和排斥点过程的求积，以及从特定于切片Wasserstein距离的文献中提取排斥求积。然后，我们数值基准这些正交。此外，我们还分析了UnifOrtho估计，一个正交蒙特卡罗估计的方差。我们的分析揭示了UnifOrtho的成功估计切片Wasserstein在大尺寸，以及从文献中的反例。我们的最后建议计算切片Wasserstein距离是使用随机化准蒙特卡罗在低维和\n {UnifOrtho}在大尺寸。基于DPP的求积只有在准蒙特卡罗也这样做时才会发光，而排斥求积一般表现出适度的方差减少，但需要更多的理论努力才能使它们鲁棒。
摘要：In this paper, we consider the problem of computing the integral of a function on the unit sphere, in any dimension, using Monte Carlo methods. Although the methods we present are general, our guiding thread is the sliced Wasserstein distance between two measures on $\mathbb{R}^d$, which is precisely an integral on the $d$-dimensional sphere. The sliced Wasserstein distance (SW) has gained momentum in machine learning either as a proxy to the less computationally tractable Wasserstein distance, or as a distance in its own right, due in particular to its built-in alleviation of the curse of dimensionality. There has been recent numerical benchmarks of quadratures for the sliced Wasserstein, and our viewpoint differs in that we concentrate on quadratures where the nodes are repulsive, i.e. negatively dependent. Indeed, negative dependence can bring variance reduction when the quadrature is adapted to the integration task. Our first contribution is to extract and motivate quadratures from the recent literature on determinantal point processes (DPPs) and repelled point processes, as well as repulsive quadratures from the literature specific to the sliced Wasserstein distance. We then numerically benchmark these quadratures. Moreover, we analyze the variance of the UnifOrtho estimator, an orthogonal Monte Carlo estimator. Our analysis sheds light on UnifOrtho's success for the estimation of the sliced Wasserstein in large dimensions, as well as counterexamples from the literature. Our final recommendation for the computation of the sliced Wasserstein distance is to use randomized quasi-Monte Carlo in low dimensions and \emph{UnifOrtho} in large dimensions. DPP-based quadratures only shine when quasi-Monte Carlo also does, while repelled quadratures show moderate variance reduction in general, but more theoretical effort is needed to make them robust.

【24】Automated Tuning for Diffusion Inverse Problem Solvers without Generative Prior Retraining
标题：无需生成性先验重新训练即可自动调整逆扩散问题求解器
链接：https://arxiv.org/abs/2509.09880

作者：u Alçalar, Junno Yun, Mehmet Akçakaya
备注：IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2025
摘要：扩散/分数为基础的模型最近出现了强大的生成先验解决逆问题，包括加速MRI重建。虽然它们的灵活性允许将测量模型与学习的先验知识解耦，但它们的性能在很大程度上取决于精心调整的数据保真度权重，特别是在具有很少去噪步骤的快速采样时间表下。现有的方法往往依赖于算法或固定的权重，无法在不同的测量条件和不规则的时间步长时间表推广。在这项工作中，我们提出了Zero-shot自适应扩散采样（ZADS），测试时的优化方法，自适应调整保真度权重在任意噪声时间表，而不需要重新训练的扩散先验。ZADS将去噪过程视为固定的展开采样器，并仅使用欠采样测量以自我监督的方式优化保真度权重。在fastMRI膝关节数据集上的实验表明，ZADS始终优于传统的压缩感知和最近的基于扩散的方法，展示了其在不同噪声时间表和采集设置下提供高保真重建的能力。
摘要：Diffusion/score-based models have recently emerged as powerful generative priors for solving inverse problems, including accelerated MRI reconstruction. While their flexibility allows decoupling the measurement model from the learned prior, their performance heavily depends on carefully tuned data fidelity weights, especially under fast sampling schedules with few denoising steps. Existing approaches often rely on heuristics or fixed weights, which fail to generalize across varying measurement conditions and irregular timestep schedules. In this work, we propose Zero-shot Adaptive Diffusion Sampling (ZADS), a test-time optimization method that adaptively tunes fidelity weights across arbitrary noise schedules without requiring retraining of the diffusion prior. ZADS treats the denoising process as a fixed unrolled sampler and optimizes fidelity weights in a self-supervised manner using only undersampled measurements. Experiments on the fastMRI knee dataset demonstrate that ZADS consistently outperforms both traditional compressed sensing and recent diffusion-based methods, showcasing its ability to deliver high-fidelity reconstructions across varying noise schedules and acquisition settings.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递