社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  机器学习算法

机器学习学术速递[8.14]

arXiv每日学术速递 • 2 周前 • 444 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计146篇


大模型相关(17篇)

【1】Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks
标题:基于神经Bandit的任务流水线最优LLM选择
链接:https://arxiv.org/abs/2508.09958

作者:lar, Eddie Zhang, Carlee Joe-Wong
备注:Submitted to AAAI 2026
摘要:随着大型语言模型(LLM)在各种任务中的日益普及,人们对能够预测一组LLM中的哪一个将以低成本产生成功答案的策略越来越感兴趣。这个问题有望变得越来越相关,因为像微软这样的提供商允许用户轻松地创建专门用于特定类型查询的自定义LLM“助手”。然而,一些任务(即,查询)可能过于专业化并且难以由单个LLM单独处理。这些应用程序通常受益于将任务分解为更小的子任务,然后每个子任务都可以由LLM执行,期望在该特定子任务上执行良好。例如,在从医疗记录中提取诊断时,可以首先选择LLM来总结记录,选择另一个LLM来验证总结,然后选择另一个可能不同的LLM来从总结的记录中提取诊断。与现有的LLM选择或路由算法不同,这种设置要求我们选择一系列LLM,每个LLM的输出都输入到下一个LLM,并可能影响其成功。因此,与单个LLM选择不同,每个子任务输出的质量直接影响下游LLM的输入,从而影响下游LLM的成本和成功率,从而产生必须在选择期间学习和考虑的复杂性能依赖性。我们提出了一种基于神经上下文的Bandit算法,该算法训练神经网络,以在线方式对每个子任务的LLM成功进行建模,从而学习指导不同子任务的LLM选择,即使在没有历史LLM性能数据的情况下。电信问答和医疗诊断预测数据集上的实验表明,我们提出的方法相比,其他LLM选择算法的有效性。
摘要:With the increasing popularity of large language models (LLMs) for a variety of tasks, there has been a growing interest in strategies that can predict which out of a set of LLMs will yield a successful answer at low cost. This problem promises to become more and more relevant as providers like Microsoft allow users to easily create custom LLM "assistants" specialized to particular types of queries. However, some tasks (i.e., queries) may be too specialized and difficult for a single LLM to handle alone. These applications often benefit from breaking down the task into smaller subtasks, each of which can then be executed by a LLM expected to perform well on that specific subtask. For example, in extracting a diagnosis from medical records, one can first select an LLM to summarize the record, select another to validate the summary, and then select another, possibly different, LLM to extract the diagnosis from the summarized record. Unlike existing LLM selection or routing algorithms, this setting requires that we select a sequence of LLMs, with the output of each LLM feeding into the next and potentially influencing its success. Thus, unlike single LLM selection, the quality of each subtask's output directly affects the inputs, and hence the cost and success rate, of downstream LLMs, creating complex performance dependencies that must be learned and accounted for during selection. We propose a neural contextual bandit-based algorithm that trains neural networks that model LLM success on each subtask in an online manner, thus learning to guide the LLM selections for the different subtasks, even in the absence of historical LLM performance data. Experiments on telecommunications question answering and medical diagnosis prediction datasets illustrate the effectiveness of our proposed approach compared to other LLM selection algorithms.


【2】Specialised or Generic? Tokenization Choices for Radiology Language Models
标题:专业还是通用?放射学语言模型的代币化选择
链接:https://arxiv.org/abs/2508.09952

作者:Warr, Wentian Xu, Harry Anthony, Yasin Ibrahim, Daniel McGowan, Konstantinos Kamnitsas
备注:Accepted to ELAMI@MICCAI2025
摘要:语言模型(LM)使用的词汇表(由标记器定义)在文本生成质量中起着关键作用。然而,其影响在放射学中仍未得到充分探索。在这项工作中,我们通过系统地比较一般,医疗和特定领域的标记器在三种成像模式的放射学报告总结任务上来解决这一差距。我们还调查了在PubMed摘要上有和没有LM预训练的情况。我们的研究结果表明,当模型从头开始训练时,医学和特定领域的词汇表表现优于广泛使用的自然语言替代品。预训练部分地减轻了标记器之间的性能差异,而特定于域的标记器实现了最有利的结果。由于词汇表更小、序列更短,特定于域的标记器还可以减少内存需求。这些结果表明,使LM的词汇适应临床领域提供了实际的好处,包括提高性能和降低计算需求,使这些模型更容易获得和有效的研究和现实世界的医疗保健设置。
摘要:The vocabulary used by language models (LM) - defined by the tokenizer - plays a key role in text generation quality. However, its impact remains under-explored in radiology. In this work, we address this gap by systematically comparing general, medical, and domain-specific tokenizers on the task of radiology report summarisation across three imaging modalities. We also investigate scenarios with and without LM pre-training on PubMed abstracts. Our findings demonstrate that medical and domain-specific vocabularies outperformed widely used natural language alternatives when models are trained from scratch. Pre-training partially mitigates performance differences between tokenizers, whilst the domain-specific tokenizers achieve the most favourable results. Domain-specific tokenizers also reduce memory requirements due to smaller vocabularies and shorter sequences. These results demonstrate that adapting the vocabulary of LMs to the clinical domain provides practical benefits, including improved performance and reduced computational demands, making such models more accessible and effective for both research and real-world healthcare settings.


【3】A Comprehensive Evaluation framework of Alignment Techniques for LLMs
标题:LLM对齐技术综合评估框架
链接:https://arxiv.org/abs/2508.09937

作者:zmat, Momin Abbas, Maysa Malfiza Garcia de Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Machado, Rogerio A de Paula, Raya Horesh, Yixin Chen, Heloisa Caroline de Souza Pereira Candello, Rebecka Nordenlow, Aminat Adebiyi
备注:In submission
摘要:随着大型语言模型(LLM)越来越多地集成到现实世界的应用程序中,确保其输出符合人类价值观和安全标准变得至关重要。该领域已经开发了多种对齐方法,包括传统的微调方法(RLHF,指令调整),事后校正系统和推理时间干预,每种方法都具有明显的优势和局限性。然而,由于缺乏统一的评价框架,很难系统地比较这些范例并指导部署决策。本文介绍了一个多维度的评估对准技术的LLM,一个全面的评估框架,提供了一个系统的比较,在所有主要的对准范例。我们的框架沿着四个关键维度评估方法:对齐检测,对齐质量,计算效率和鲁棒性。通过不同的基础模型和对齐策略的实验,我们展示了我们的框架在识别当前最先进模型的优势和局限性方面的实用性,为未来的研究方向提供了有价值的见解。
摘要 :As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values and safety standards has become critical. The field has developed diverse alignment approaches including traditional fine-tuning methods (RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these paradigms and guide deployment decisions. This paper introduces a multi-dimensional evaluation of alignment techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across all major alignment paradigms. Our framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. Through experiments across diverse base models and alignment strategies, we demonstrate the utility of our framework in identifying strengths and limitations of current state-of-the-art models, providing valuable insights for future research directions.


【4】Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs
标题:超越天真的预算:使用LLM改进Zero-Shot上下文辅助预测的策略
链接:https://arxiv.org/abs/2508.09904

作者:ok, Andrew Robert Williams, Vincent Zhihao Zheng, Irina Rish, Nicolas Chapados, Étienne Marcotte, Valentina Zantedeschi, Alexandre Drouin
摘要:在现实世界中进行预测不仅需要模型整合历史数据,还需要整合相关的上下文信息,这些信息通常以文本形式提供。虽然最近的工作表明,大型语言模型(LLM)可以通过简单的直接提示成为有效的上下文辅助预测器,但它们的全部潜力仍然没有得到充分挖掘。我们用4种策略来解决这一差距,为LLM在这种情况下的zero-shot能力提供了新的见解。ReDP通过引出明确的推理痕迹来提高可解释性,使我们能够独立于其预测准确性来评估模型在上下文上的推理。CorDP仅利用LLM来改进现有的预测,增强其在现实世界预测管道中的适用性。IC-DP建议在提示中嵌入上下文辅助预测任务的历史示例,即使是最大的模型也能大幅提高准确性。最后,RouteDP通过使用LLM来估计任务难度,并将最具挑战性的任务路由到更大的模型来优化资源效率。通过对CiK基准测试中不同类型的上下文辅助预测任务进行评估,我们的策略在不同规模和家庭的LLM中表现出明显的优势。这些结果为基于LLM的上下文辅助预测的进一步简单而有效的改进打开了大门。
摘要:Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, often available in textual form. While recent work has shown that large language models (LLMs) can be effective context-aided forecasters via na\"ive direct prompting, their full potential remains underexplored. We address this gap with 4 strategies, providing new insights into the zero-shot capabilities of LLMs in this setting. ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context independently from its forecast accuracy. CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines. IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models. Finally, RouteDP optimizes resource efficiency by using LLMs to estimate task difficulty, and routing the most challenging tasks to larger models. Evaluated on different kinds of context-aided forecasting tasks from the CiK benchmark, our strategies demonstrate distinct benefits over na\"ive prompting across LLMs of different sizes and families. These results open the door to further simple yet effective improvements in LLM-based context-aided forecasting.


【5】Improving Diversity in Language Models: When Temperature Fails, Change the Loss
标题:改善语言模型的多样性:当温度下降时,改变损失
链接:https://arxiv.org/abs/2508.09654

作者: Verine, Florian Le Bronnec, Kunhao Zheng, Alexandre Allauzen, Yann Chevaleyre, Benjamin Negrevergne
备注:Forty-Second International Conference on Machine Learning, ICML2025
摘要:增加语言模型的多样性是一个具有挑战性但至关重要的目标。一种常见的方法是提高解码温度。在这项工作中,我们通过一个简单但常见的案例来研究这种方法,以深入了解为什么降低温度可以提高质量(精度),而增加温度往往无法提高覆盖率(召回)。我们的分析表明,为了使模型能够通过温度调整进行有效的可调,它必须进行覆盖训练。为了解决这个问题,我们建议通过利用精确召回框架来重新思考语言模型中的损失函数。我们的研究结果表明,这种方法在精度和召回率之间实现了更好的权衡,而不仅仅是将负对数似然训练与温度缩放相结合。这些发现提供了一条通往更通用和更强大的语言建模技术的途径。
摘要:Increasing diversity in language models is a challenging yet essential objective. A common approach is to raise the decoding temperature. In this work, we investigate this approach through a simplistic yet common case to provide insights into why decreasing temperature can improve quality (Precision), while increasing it often fails to boost coverage (Recall). Our analysis reveals that for a model to be effectively tunable through temperature adjustments, it must be trained toward coverage. To address this, we propose rethinking loss functions in language models by leveraging the Precision-Recall framework. Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling. These findings offer a pathway toward more versatile and robust language modeling techniques.


【6】Interpretable Robot Control via Structured Behavior Trees and Large Language Models
标题:通过结构化行为树和大型语言模型的可解释机器人控制
链接:https://arxiv.org/abs/2508.09621

作者:éva Chekam, Ines Pastor-Martinez, Ali Tourani, Jose Andres Millan-Romera, Laura Ribeiro, Pedro Miguel Bastos Soares, Holger Voos, Jose Luis Sanchez-Lopez
备注:15 pages, 5 figures, 3 tables
摘要:随着智能机器人越来越多地融入人类环境,人们越来越需要直观可靠的人机交互(HRI)界面,这些界面具有自适应性和更自然的交互。传统的机器人控制方法通常需要用户适应界面或记住预定义的命令,从而限制了在动态非结构化环境中的可用性。本文提出了一种新的框架,通过将大语言模型(LLM)与行为树相结合,将自然语言理解和机器人执行联系起来。这种集成使机器人能够解释用户给出的自然语言指令,并通过激活特定领域的插件将其转换为可执行的操作。该系统支持可扩展和模块化集成,主要关注基于感知的功能,如人员跟踪和手势识别。为了评估该系统,在不同的环境中进行了一系列真实世界的实验。实验结果表明,所提出的方法是实用的,在现实世界的场景中,平均认知到执行的准确率约为94%,HRI系统和机器人作出了重大贡献。该框架的完整源代码可在https://github.com/snt-arg/robot_suite上公开获得。
摘要:As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combining Large Language Models (LLMs) with Behavior Trees. This integration enables robots to interpret natural language instructions given by users and translate them into executable actions by activating domain-specific plugins. The system supports scalable and modular integration, with a primary focus on perception-based functionalities, such as person tracking and hand gesture recognition. To evaluate the system, a series of real-world experiments was conducted across diverse environments. Experimental results demonstrate that the proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%, making a significant contribution to HRI systems and robots. The complete source code of the framework is publicly available at https://github.com/snt-arg/robot_suite.


【7】SYNAPSE-G: Bridging Large Language Models and Graph Learning for Rare Event Classification
标题:SYAPSE-G:将大型语言模型和图学习连接起来以实现罕见事件分类
链接:https://arxiv.org/abs/2508.09544

作者:akkol, Lin Chen, Max Springer, Abigail Schantz, Blaž Bratanič, Vincent Cohen-Addad, MohammadHossein Bateni
摘要 :标记数据的稀缺性,特别是对于罕见事件,阻碍了训练有效的机器学习模型。本文提出了SYNAPSE-G(Synthetic Augmentation for Positive Sampling via Expansion on Graphs),这是一种利用大型语言模型(LLM)生成用于稀有事件分类的合成训练数据的新型管道,解决了冷启动问题。该合成数据作为种子,用于在种子和大型未标记数据集之间构建的相似性图上进行半监督标签传播。这会识别候选的阳性示例,随后由Oracle(人类或LLM)标记。扩展的数据集然后训练/微调分类器。我们从理论上分析了合成数据的质量(有效性和多样性)如何影响我们方法的精度和召回率。在不平衡的SST 2和MHS数据集上的实验证明了SYNAPSE-G在寻找正标签方面的有效性,优于包括最近邻搜索在内的基线。
摘要:Scarcity of labeled data, especially for rare events, hinders training effective machine learning models. This paper proposes SYNAPSE-G (Synthetic Augmentation for Positive Sampling via Expansion on Graphs), a novel pipeline leveraging Large Language Models (LLMs) to generate synthetic training data for rare event classification, addressing the cold-start problem. This synthetic data serve as seeds for semi-supervised label propagation on a similarity graph constructed between the seeds and a large unlabeled dataset. This identifies candidate positive examples, subsequently labeled by an oracle (human or LLM). The expanded dataset then trains/fine-tunes a classifier. We theoretically analyze how the quality (validity and diversity) of the synthetic data impacts the precision and recall of our method. Experiments on the imbalanced SST2 and MHS datasets demonstrate SYNAPSE-G's effectiveness in finding positive labels, outperforming baselines including nearest neighbor search.


【8】Enhancing Memory Recall in LLMs with Gauss-Tin: A Hybrid Instructional and Gaussian Replay Approach
标题:使用Gauss-Tin增强LLM中的记忆回忆:混合教学和高斯回放方法
链接:https://arxiv.org/abs/2508.09510

作者:akhiroh, Thomas Fevens
摘要:尽管大型语言模型(LLM)取得了重大进展,但灾难性遗忘仍然是一个重大挑战,模型在学习新信息时会丢失先前获得的知识。持续学习(CL)策略已成为解决这一问题的潜在方案,基于重放的技术在保存所学知识方面表现出优越的性能。在这种情况下,我们引入了Gauss-Tin,这是一种新的方法,它将重放策略与高斯混合模型相结合,以提高训练过程中样本选择的质量,并辅以教学指导,以促进过去学习的生成。这种方法旨在通过战略性地加强重要的过去学习,同时容纳新的信息,以提高LLM的保留能力。我们的实验结果表明,一个有前途的6%的改进,保留指标比传统的方法,这表明高斯-锡是一个有效的策略,减轻灾难性的遗忘在LLM。这项研究强调了混合模型在增强动态学习环境中LLM的鲁棒性和适应性方面的潜力。
摘要:Despite the significant advancements in Large Language Models (LLMs), catastrophic forgetting remains a substantial challenge, where models lose previously acquired knowledge upon learning new information. Continual learning (CL) strategies have emerged as a potential solution to this problem, with replay-based techniques demonstrating superior performance in preserving learned knowledge. In this context, we introduce Gauss-Tin, a novel approach that integrates the replay strategy with a Gaussian mixture model to enhance the quality of sample selection during training, supplemented by instructional guidance to facilitate the generation of past learning. This method aims to improve LLMs' retention capabilities by strategically reinforcing important past learnings while accommodating new information. Our experimental results indicate a promising 6\% improvement in retention metrics over traditional methods, suggesting that Gauss-Tin is an effective strategy for mitigating catastrophic forgetting in LLMs. This study underscores the potential of hybrid models in enhancing the robustness and adaptability of LLMs in dynamic learning environments.


【9】NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
标题:NeuronButton:细粒度神经元调制,实现LLM中平衡的安全-效用对齐
链接:https://arxiv.org/abs/2508.09473

作者:n, Mayi Xu, Qiankun Pi, Jianhao Chen, Yuanyuan Zhu, Ming Zhong, Tieyun Qian
摘要:在保持实用性的同时确保强大的安全对齐对于大型语言模型(LLM)的可靠部署至关重要。然而,目前的技术从根本上遭受相互交织的缺陷:对恶意攻击的鲁棒性不足,经常拒绝良性查询,在生成的文本质量和一般任务性能的退化-前两个反映赤字鲁棒安全和后者构成效用损害。我们跟踪这些限制现有方法中的粗粒度逐层干预。为了解决这个问题,我们提出了NeuronTune,一个细粒度的框架,动态调制稀疏神经元,以实现同时的安全性和效用优化。我们的方法首先通过属性在所有层中识别安全关键和效用保护神经元,然后采用元学习来自适应地放大安全神经元激活并抑制效用神经元激活。至关重要的是,NeuronTune通过神经元计数阈值实现干预范围的可调调整,支持灵活适应安全关键或实用程序优先场景。大量的实验结果表明,我们的方法显着优于现有的最先进的技术,实现卓越的模型安全性,同时保持良好的实用性。
摘要:Ensuring robust safety alignment while preserving utility is critical for the reliable deployment of Large Language Models (LLMs). However, current techniques fundamentally suffer from intertwined deficiencies: insufficient robustness against malicious attacks, frequent refusal of benign queries, degradation in generated text quality and general task performance--the former two reflecting deficits in robust safety and the latter constituting utility impairment. We trace these limitations to the coarse-grained layer-wise interventions in existing methods. To resolve this, we propose NeuronTune, a fine-grained framework that dynamically modulates sparse neurons to achieve simultaneous safety-utility optimization. Our approach first identifies safety-critical and utility-preserving neurons across all layers via attribution, then employs meta-learning to adaptively amplify safety-neuron activations and suppress utility-neuron activations. Crucially, NeuronTune enables tunable adjustment of intervention scope via neuron-count thresholds, supporting flexible adaptation to security-critical or utility-priority scenarios. Extensive experimental results demonstrate that our method significantly outperforms existing state-of-the-art technologies, achieving superior model safety while maintaining excellent utility.


【10】EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
标题:EGGS-PTP:一种用于大型语言模型的扩展图引导的结构化训练后修剪方法
链接:https://arxiv.org/abs/2508.09471

作者:rbachi, Zijun Sun, Yanning Shen
摘要:随着大型语言模型(LLM)的广泛采用和规模的扩大,部署这些大规模基础模型所涉及的计算和内存挑战变得越来越严峻。这突出表明迫切需要开发更有效的模型变体。面对这一挑战,本工作介绍EGGS-PTP:一种扩展图引导的结构化后训练剪枝方法。所提出的方法利用图论来指导N:M结构化剪枝的设计,有效地减少了模型大小和计算需求。通过结合扩展图中的概念,EGGS-PTP确保了修剪网络中的信息流,保留了基本的模型功能。大量的数值实验表明,EGGS-PTP不仅实现了显着的加速和内存节省,由于结构化稀疏,但也优于现有的结构化修剪技术在各种LLM的准确性。
摘要:As Large Language Models (LLMs) become more widely adopted and scale up in size, the computational and memory challenges involved in deploying these massive foundation models have grown increasingly severe. This underscores the urgent need to develop more efficient model variants. Faced with this challenge, the present work introduces EGGS-PTP: an Expander-Graph Guided Structured Post-training Pruning method. The proposed approach leverages graph theory to guide the design of N:M structured pruning, effectively reducing model size and computational demands. By incorporating concepts from expander graphs, EGGS-PTP ensures information flow within the pruned network, preserving essential model functionality. Extensive numerical experiments demonstrate that EGGS-PTP not only achieves significant acceleration and memory savings due to structured sparsity but also outperforms existing structured pruning techniques in terms of accuracy across various LLMs.


【11】DeepFeatIoT: Unifying Deep Learned, Randomized, and LLM Features for Enhanced IoT Time Series Sensor Data Classification in Smart Industries
标题:深度学习物联网:统一深度学习,随机化和LLM功能,用于智能行业中的增强物联网时间序列传感器数据分类
链接:https://arxiv.org/abs/2508.09468

作者:Sakib Khan Inan, Kewen Liao
备注:Accepted for publication at IJCAI 2025
摘要:物联网(IoT)传感器是部署在智能城市、工业场所和医疗保健系统中的无处不在的技术。他们不断生成时间序列数据,从而实现行业的高级分析和自动化。然而,传感器元数据的丢失或模糊性、数据源的异构性、不同的采样频率、不一致的测量单位和不规则的时间戳等挑战使得原始物联网时间序列数据难以解释,从而破坏了智能系统的有效性。为了解决这些挑战,我们提出了一种新的深度学习模型DeepMind IoT,它将学习的局部和全局特征与非学习的随机卷积核特征以及来自大型语言模型(LLM)的特征集成在一起。这种简单而独特的融合了各种学习和非学习功能,显著增强了物联网时间序列传感器数据分类,即使在标记数据有限的情况下也是如此。我们的模型的有效性通过其在来自不同关键应用领域的多个真实世界物联网传感器数据集上的一致和通用性能得到证明,表现优于最先进的基准模型。这些结果凸显了DeepMind IoT在推动物联网分析方面取得重大进展并支持下一代智能系统开发的潜力。
摘要:Internet of Things (IoT) sensors are ubiquitous technologies deployed across smart cities, industrial sites, and healthcare systems. They continuously generate time series data that enable advanced analytics and automation in industries. However, challenges such as the loss or ambiguity of sensor metadata, heterogeneity in data sources, varying sampling frequencies, inconsistent units of measurement, and irregular timestamps make raw IoT time series data difficult to interpret, undermining the effectiveness of smart systems. To address these challenges, we propose a novel deep learning model, DeepFeatIoT, which integrates learned local and global features with non-learned randomized convolutional kernel-based features and features from large language models (LLMs). This straightforward yet unique fusion of diverse learned and non-learned features significantly enhances IoT time series sensor data classification, even in scenarios with limited labeled data. Our model's effectiveness is demonstrated through its consistent and generalized performance across multiple real-world IoT sensor datasets from diverse critical application domains, outperforming state-of-the-art benchmark models. These results highlight DeepFeatIoT's potential to drive significant advancements in IoT analytics and support the development of next-generation smart systems.


【12】Teaching Code Refactoring Using LLMs
标题:使用LLM教授代码重构
链接:https://arxiv.org/abs/2508.09332

作者:airnar, Aarya Rajoju, Edward F. Gehringer
备注:Accepted for presentation at the Frontiers in Education Conference, Nashville, Tennessee, USA, 2-5 November 2025
摘要:这篇创新实践论文探讨了大型语言模型(LLM)如何通过实时,上下文感知的反馈来增强软件工程课程中的代码重构教学。重构提高了代码质量,但很难教,特别是对于复杂的真实代码库。传统的方法,如代码审查和静态分析工具提供有限的,不一致的反馈。我们的方法使用结构化提示将LLM辅助重构集成到课程项目中,以帮助学生识别和解决代码气味,如长方法和低内聚。在2025年春季在一个长期的OSS项目中实施,通过学生反馈和对代码质量改进的计划分析来评估干预措施。研究结果表明,LLM可以弥合理论和实践学习,支持更深入地了解可维护性和重构原则。
摘要:This Innovative Practice full paper explores how Large Language Models (LLMs) can enhance the teaching of code refactoring in software engineering courses through real-time, context-aware feedback. Refactoring improves code quality but is difficult to teach, especially with complex, real-world codebases. Traditional methods like code reviews and static analysis tools offer limited, inconsistent feedback. Our approach integrates LLM-assisted refactoring into a course project using structured prompts to help students identify and address code smells such as long methods and low cohesion. Implemented in Spring 2025 in a long-lived OSS project, the intervention is evaluated through student feedback and planned analysis of code quality improvements. Findings suggest that LLMs can bridge theoretical and practical learning, supporting a deeper understanding of maintainability and refactoring principles.


【13】LLM Empowered Prototype Learning for Zero and Few-Shot Tasks on Tabular Data
标题:LLM授权原型学习,用于表格数据上的零镜头和Few-Shot任务
链接:https://arxiv.org/abs/2508.09263

作者:, Dongsheng Wang, He Zhao, Hangting Ye, Dandan Guo, Yi Chang
摘要:大型语言模型(LLM)的最新突破为深入研究其在表格数据建模中的潜力打开了大门。然而,在Few-Shot和甚至zero-shot场景中有效地利用先进的LLM仍然具有挑战性。为此,我们提出了一种新的基于LLM的原型估计框架,用于表格学习。我们的核心思想是查询LLM生成基于特征值的无示例提示,它只依赖于任务和特征描述。利用LLM生成的特征值,我们可以以免训练的方式构建zero-shot原型,通过融合Few-Shot样本,避免训练分类器或微调LLM,可以进一步增强原型。由于无示例提示和原型估计,我们绕过了基于示例的提示带来的限制,提供了一个可扩展的和强大的框架。大量的实验证明了我们的零和Few-Shot表格学习的有效性。
摘要:Recent breakthroughs in large language models (LLMs) have opened the door to in-depth investigation of their potential in tabular data modeling. However, effectively utilizing advanced LLMs in few-shot and even zero-shot scenarios is still challenging. To this end, we propose a novel LLM-based prototype estimation framework for tabular learning. Our key idea is to query the LLM to generate feature values based example-free prompt, which solely relies on task and feature descriptions. With the feature values generated by LLM, we can build a zero-shot prototype in a training-free manner, which can be further enhanced by fusing few-shot samples, avoiding training a classifier or finetuning the LLMs. Thanks to the example-free prompt and prototype estimation, ours bypasses the constraints brought by the example-based prompt, providing a scalable and robust framework. Extensive experiments demonstrate the effectiveness of ours in zero and few-shot tabular learning.


【14】Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
标题:扩散LLM可以通过离散扩散强迫进行比AR更快的推理
链接:https://arxiv.org/abs/2508.09192

作者:Chenkai Xu, Yijie Jin, Jiachun Jin, Hao Zhang, Zhijie Deng
摘要:扩散大语言模型(DLLM)已经成为自回归(AR)LLM的一个有前途的替代品,用于文本生成,具有在单次迭代中解码多个标记的潜力。然而,现有的开源dLLM都没有实现比类似大小的AR LLM更快的推理速度。本文打破了这一障碍的基础上,一个简单而有效的策略称为离散扩散强迫(D2 F)。D2 F为dLLM配备了两个关键能力:(1)逐块自回归生成,以实现KV缓存利用率;(2)预测后续令牌,而无需完成先前块,以进行块间并行解码。通过这种方式,香草dLLM被翻新成AR扩散混合范例,以进行有效的推理。D2 F可以用基于预先训练的dLLM的不对称蒸馏过程来实现。我们进一步提出了一个流水线并行解码算法,它使效率和功效之间的权衡。从经验上看,D2 F dLLM在GSM 8 K上的推理速度比LLaMA 3和Qwen 2. 5快2. 5倍。与LLaDA和Dream等普通dLLM相比,加速可以超过$\mathbf{50\times}$,同时保持相当的输出质量。该代码可在https://github.com/zhijie-group/Discrete-Diffusion-Forcing上获得。
摘要:Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source dLLMs have achieved superior inference speed over AR LLMs of similar size. This paper breaks this barrier based on a simple and effective strategy named discrete diffusion forcing (D2F). D2F equips dLLMs with two key capabilities: (1) block-wise autoregressive generation to enable KV cache utilization; (2) prediction of following tokens without requiring completion of prior blocks for inter-block parallel decoding. In this way, the vanilla dLLMs are refurbished into an AR-diffusion hybrid paradigm for efficient inference. D2F can be implemented with an asymmetric distillation process based on pre-trained dLLMs. We further propose a pipelined parallel decoding algorithm, which enables a trade-off between efficiency and efficacy. Empirically, D2F dLLMs achieve more than $\mathbf{2.5\times}$ inference speed than LLaMA3 and Qwen2.5 on GSM8K. Compared to vanilla dLLMs like LLaDA and Dream, the acceleration can be more than $\mathbf{50\times}$ while maintaining comparable output quality. The code is available at https://github.com/zhijie-group/Discrete-Diffusion-Forcing.


【15】From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization
标题:从值到代币:通过符号离散化进行上下文感知时间序列预测的LLM驱动框架
链接:https://arxiv.org/abs/2508.09191

作者:o, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang
摘要:时间序列预测在支持能源、医疗保健和金融等广泛关键应用的决策方面发挥着至关重要的作用。尽管最近的进展,预测精度仍然有限,由于历史数字序列与上下文特征,其中往往包括非结构化的文本数据相结合的挑战。为了应对这一挑战,我们提出了TokenCast,这是一个LLM驱动的框架,它利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说,TokenCast采用离散标记器将连续的数字序列转换为时间标记,从而实现与基于语言的输入的结构对齐。为了弥合模态之间的语义差距,时间和上下文令牌都通过预训练的大型语言模型(LLM)嵌入到共享表示空间中,并使用自回归生成目标进行进一步优化。在这个统一的语义空间的基础上,对齐的LLM随后以监督的方式进行微调,以预测未来的时间令牌,然后将其解码回原始的数值空间。在不同的真实世界数据集上进行的大量实验证明了TokenCast的有效性和通用性。
摘要:Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, an LLM-driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained large language model (LLM), further optimized with autoregressive generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on diverse real-world datasets enriched with contextual features demonstrate the effectiveness and generalizability of TokenCast.


【16】Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks
标题:具有免训练连续投影的细粒度安全神经元,以降低LLM微调风险
链接:https://arxiv.org/abs/2508.09190

作者: Feifei Zhao, Dongcheng Zhao, Guobin Shen, Ping Wu, Yu Shi, Yi Zeng
摘要:微调即服务将特定领域的知识注入到大型语言模型(LLM)中,同时挑战了原始的对齐机制并引入了安全风险。针对对齐、微调和微调后阶段提出了一系列防御策略,其中大多数微调后防御依赖于粗粒度安全层映射。这些方法缺乏对安全层和细粒度神经元的综合考虑,限制了它们有效平衡安全性和实用性的能力。为了解决这个问题,我们提出了细粒度安全神经元(FGSN)与训练免费连续投影方法,以减少微调的安全风险。FGSN固有地集成了安全层和神经元之间的多尺度交互,定位更稀疏和更精确的细粒度安全神经元,同时最大限度地减少对下游任务神经元的干扰。然后,我们将安全神经元参数投影到安全方向上,提高模型的安全性,同时更紧密地与人类偏好保持一致。在多个微调的LLM模型上进行的广泛实验表明,我们的方法在保持模型实用性的同时,以最小的参数修改显着降低了危害分数和攻击成功率。此外,通过引入特定于任务的多维异构安全神经元簇优化机制,我们实现了对不可预见的新出现的安全问题的持续防御和泛化能力。
摘要:Fine-tuning as service injects domain-specific knowledge into large language models (LLMs), while challenging the original alignment mechanisms and introducing safety risks. A series of defense strategies have been proposed for the alignment, fine-tuning, and post-fine-tuning phases, where most post-fine-tuning defenses rely on coarse-grained safety layer mapping. These methods lack a comprehensive consideration of both safety layers and fine-grained neurons, limiting their ability to efficiently balance safety and utility. To address this, we propose the Fine-Grained Safety Neurons (FGSN) with Training-Free Continual Projection method to reduce the fine-tuning safety risks. FGSN inherently integrates the multi-scale interactions between safety layers and neurons, localizing sparser and more precise fine-grained safety neurons while minimizing interference with downstream task neurons. We then project the safety neuron parameters onto safety directions, improving model safety while aligning more closely with human preferences. Extensive experiments across multiple fine-tuned LLM models demonstrate that our method significantly reduce harmfulness scores and attack success rates with minimal parameter modifications, while preserving the model's utility. Furthermore, by introducing a task-specific, multi-dimensional heterogeneous safety neuron cluster optimization mechanism, we achieve continual defense and generalization capability against unforeseen emerging safety concerns.


【17】SVGen: Interpretable Vector Graphics Generation with Large Language Models
标题:SVGen:使用大型语言模型的可解释载体图形生成
链接:https://arxiv.org/abs/2508.09168

作者:g, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao, Hao Sun, Xuelong Li
摘要:可缩放矢量图形(SVG)由于其可伸缩性,可编辑性和渲染效率而广泛用于前端开发和UI/UX设计。然而,将创意转化为精确的矢量图形仍然是一个耗时的挑战。为了解决这个问题,我们引入了SVG-1 M,这是一个大规模的高质量SVG数据集,与自然语言描述配对。通过先进的数据增强和注释,我们创建了对齐良好的文本到SVG训练对,包括一个带有思想链注释的子集,以增强语义指导。基于这个数据集,我们提出了SVGen,这是一个端到端的模型,可以从自然语言输入生成SVG代码。我们的方法确保了语义的准确性和结构的完整性,并得到了课程学习和强化学习优化的支持。实验表明,SVGen在效果和效率上都优于一般的大型模型和传统的绘制方法。代码、模型和数据集都可以在GitHub上找到。
摘要:Scalable Vector Graphics (SVG) is widely used in front-end development and UI/UX design due to its scalability, editability, and rendering efficiency. However, turning creative ideas into precise vector graphics remains a time-consuming challenge. To address this, we introduce SVG-1M, a large-scale dataset of high-quality SVGs paired with natural language descriptions. Through advanced data augmentation and annotation, we create well-aligned Text to SVG training pairs, including a subset with Chain of Thought annotations for enhanced semantic guidance. Based on this dataset, we propose SVGen, an end-to-end model that generates SVG code from natural language inputs. Our approach ensures semantic accuracy and structural completeness, supported by curriculum learning and reinforcement learning optimization. Experiments show that SVGen outperforms general large models and traditional rendering methods in both effectiveness and efficiency. Code, model, and dataset are available on GitHub.


Graph相关(图学习|图神经网络|图优化等)(13篇)

【1】Dynamic Mixture-of-Experts for Incremental Graph Learning
标题:增量图学习的动态专家混合
链接:https://arxiv.org/abs/2508.09974

作者:ong, Theodore Vasiloudis, Seongjun Yun, Han Xie, Xiang Song
摘要 :图增量学习是一种学习范式,旨在使训练的模型适应随时间不断增加的图和数据,而无需对完整数据集进行再训练。然而,当应用于增量学习设置时,常规图机器学习方法遭受灾难性遗忘,其中先前学习的知识被新知识覆盖。以前的方法试图通过将以前训练的模型视为一个不可分割的单元来解决这个问题,并使用技术来保持旧的行为,同时学习新的知识。然而,这些方法没有考虑到这样一个事实,即在不同的时间戳之前获得的知识对学习新任务的贡献不同。一些先前的模式可以被转移以帮助学习新数据,而其他模式可能偏离新数据分布并且是有害的。为了解决这个问题,我们提出了一个动态的混合专家(DyMoE)的增量学习方法。具体来说,DyMoE GNN层添加了专门用于对传入数据块进行建模的新专家网络。我们设计了一个自定义的正则化损失,利用数据序列信息,使现有的专家可以保持他们的能力,解决旧的任务,同时帮助新的专家有效地学习新的数据。随着数据块的数量随着时间的推移而增长,全专家混合(MoE)模型的计算成本增加。为了解决这个问题,我们引入了一个稀疏的MoE的方法,其中只有前$k$最相关的专家进行预测,显着减少了计算时间。与类增量学习的最佳基线相比,我们的模型实现了4.92%的相对准确率提高,显示了模型的卓越功能。
摘要:Graph incremental learning is a learning paradigm that aims to adapt trained models to continuously incremented graphs and data over time without the need for retraining on the full dataset. However, regular graph machine learning methods suffer from catastrophic forgetting when applied to incremental learning settings, where previously learned knowledge is overridden by new knowledge. Previous approaches have tried to address this by treating the previously trained model as an inseparable unit and using techniques to maintain old behaviors while learning new knowledge. These approaches, however, do not account for the fact that previously acquired knowledge at different timestamps contributes differently to learning new tasks. Some prior patterns can be transferred to help learn new data, while others may deviate from the new data distribution and be detrimental. To address this, we propose a dynamic mixture-of-experts (DyMoE) approach for incremental learning. Specifically, a DyMoE GNN layer adds new expert networks specialized in modeling the incoming data blocks. We design a customized regularization loss that utilizes data sequence information so existing experts can maintain their ability to solve old tasks while helping the new expert learn the new data effectively. As the number of data blocks grows over time, the computational cost of the full mixture-of-experts (MoE) model increases. To address this, we introduce a sparse MoE approach, where only the top-$k$ most relevant experts make predictions, significantly reducing the computation time. Our model achieved 4.92\% relative accuracy increase compared to the best baselines on class incremental learning, showing the model's exceptional power.


【2】GraphTreeGen: Subtree-Centric Approach to Efficient and Supervised Graph Generation
标题:GraphTreeGen:以子树为中心的高效且受监督的图形生成方法
链接:https://arxiv.org/abs/2508.09710

作者:o, Islem Rekik
摘要:大脑连接组,以图形的形式表示神经连接,对于理解大脑组织至关重要,但获取成本高且耗时,激励生成方法。图形生成建模的最新进展提供了一种数据驱动的替代方案,使合成连接体生成成为可能,并减少对大型神经成像数据集的依赖。然而,当前模型面临关键限制:(i)将整个图压缩成单个潜在代码(例如,VGAE)模糊了细粒度的局部基序;(ii)依赖于连接体中很少可用的丰富节点属性降低了重建质量;(iii)以边缘为中心的模型强调拓扑结构,但忽视了准确的边缘权重预测,损害了定量保真度;以及(iv)计算上昂贵的设计(例如,边缘条件卷积)强加了高存储器需求,限制了可缩放性。我们提出了GraphTreeGen(GTG),一个以子树为中心的生成框架,用于高效,准确的连接体合成。GTG将每个连接体分解成熵引导的k-hop树,捕获由共享GCN编码的信息局部结构。二分消息传递层融合子树嵌入与全局节点特征,而双分支解码器联合预测边的存在和权重以重构邻接矩阵。GTG在自我监督任务中的表现优于最先进的基线,并在监督环境中保持竞争力,以更少的内存提供更高的结构保真度和更精确的权重。其模块化设计使连接体超分辨率和跨模态合成的扩展成为可能。代码:https://github.com/basiralab/GTG/
摘要:Brain connectomes, representing neural connectivity as graphs, are crucial for understanding brain organization but costly and time-consuming to acquire, motivating generative approaches. Recent advances in graph generative modeling offer a data-driven alternative, enabling synthetic connectome generation and reducing dependence on large neuroimaging datasets. However, current models face key limitations: (i) compressing the whole graph into a single latent code (e.g., VGAEs) blurs fine-grained local motifs; (ii) relying on rich node attributes rarely available in connectomes reduces reconstruction quality; (iii) edge-centric models emphasize topology but overlook accurate edge-weight prediction, harming quantitative fidelity; and (iv) computationally expensive designs (e.g., edge-conditioned convolutions) impose high memory demands, limiting scalability. We propose GraphTreeGen (GTG), a subtree-centric generative framework for efficient, accurate connectome synthesis. GTG decomposes each connectome into entropy-guided k-hop trees capturing informative local structure, encoded by a shared GCN. A bipartite message-passing layer fuses subtree embeddings with global node features, while a dual-branch decoder jointly predicts edge existence and weights to reconstruct the adjacency matrix. GTG outperforms state-of-the-art baselines in self-supervised tasks and remains competitive in supervised settings, delivering higher structural fidelity and more precise weights with far less memory. Its modular design enables extensions to connectome super-resolution and cross-modality synthesis. Code: https://github.com/basiralab/GTG/


【3】Physics- and geometry-aware spatio-spectral graph neural operator for time-independent and time-dependent PDEs
标题:用于时间无关和时间相关偏微分方程的物理和几何感知空间谱图神经算子
链接:https://arxiv.org/abs/2508.09627

作者: Sarkar, Souvik Chakraborty
摘要:高效、准确地求解偏微分方程(PDE)仍然是科学和工程领域的一个重要挑战,尤其是对于涉及复杂几何和有限标记数据的问题。我们引入了一个物理和几何感知的空间谱图神经算子($\pi$G-Sp$^2$GNO),用于学习时间无关和时间相关偏微分方程的解算子。所提出的方法首先通过启用几何感知来改进最近开发的Sp$^2$GNO,随后利用管理物理学在无模拟设置中学习底层解算子。虽然空间光谱结构中提出的架构允许多尺度学习,本文介绍了两个独立的策略,使几何意识。对于时间相关的问题,我们还介绍了一种新的混合物理通知损失函数,结合高阶时间推进计划与升级理论启发的随机投影计划。这允许将物理信息准确地集成到损失函数中。所提出的方法的性能示出了一些基准的例子,涉及经常和复杂的域,在推理过程中的几何变化,时间无关和时间相关的问题。所获得的结果说明了所提出的方法相比,在文献中的最先进的物理信息的神经算子算法的有效性。
摘要:Solving partial differential equations (PDEs) efficiently and accurately remains a cornerstone challenge in science and engineering, especially for problems involving complex geometries and limited labeled data. We introduce a Physics- and Geometry- Aware Spatio-Spectral Graph Neural Operator ($\pi$G-Sp$^2$GNO) for learning the solution operators of time-independent and time-dependent PDEs. The proposed approach first improves upon the recently developed Sp$^2$GNO by enabling geometry awareness and subsequently exploits the governing physics to learn the underlying solution operator in a simulation-free setup. While the spatio-spectral structure present in the proposed architecture allows multiscale learning, two separate strategies for enabling geometry awareness is introduced in this paper. For time dependent problems, we also introduce a novel hybrid physics informed loss function that combines higher-order time-marching scheme with upscaled theory inspired stochastic projection scheme. This allows accurate integration of the physics-information into the loss function. The performance of the proposed approach is illustrated on number of benchmark examples involving regular and complex domains, variation in geometry during inference, and time-independent and time-dependent problems. The results obtained illustrate the efficacy of the proposed approach as compared to the state-of-the-art physics-informed neural operator algorithms in the literature.


【4】Time-Aware and Transition-Semantic Graph Neural Networks for Interpretable Predictive Business Process Monitoring
标题:用于可解释预测业务流程监控的时间感知和转换语义图神经网络
链接:https://arxiv.org/abs/2508.09527

作者:, Ernesto Damiani
备注:32 pages
摘要:预测业务流程监控(PBPM)旨在根据历史事件日志预测正在进行的案例中的未来事件。虽然图神经网络(GNN)非常适合捕捉过程数据中的结构依赖关系,但现有的基于GNN的PBPM模型仍然不发达。大多数依赖于短前缀子图或忽略时间相关性和转换语义的全局架构。我们提出了一个统一的,可解释的GNN框架,沿着三个关键轴推进最先进的技术。首先,我们比较了基于前缀的图卷积网络(GCN)和全迹图注意力网络(GAT),以量化局部化和全局建模之间的性能差距。其次,我们引入了一种新的时间衰减注意机制,该机制构造动态的、以预测为中心的窗口,强调时间相关的历史并抑制噪声。第三,我们嵌入过渡类型的语义到边缘功能,使细粒度推理结构模糊的痕迹。我们的架构包括多层次的可解释性模块,提供注意力行为的多样化可视化。在五个基准上进行评估,所提出的模型在没有每个数据集调优的情况下实现了具有竞争力的Top-k精度和DL分数。通过解决架构,时间和语义的差距,这项工作提出了一个强大的,可推广的,和解释的解决方案,在PBPM的下一个事件预测。
摘要 :Predictive Business Process Monitoring (PBPM) aims to forecast future events in ongoing cases based on historical event logs. While Graph Neural Networks (GNNs) are well suited to capture structural dependencies in process data, existing GNN-based PBPM models remain underdeveloped. Most rely either on short prefix subgraphs or global architectures that overlook temporal relevance and transition semantics. We propose a unified, interpretable GNN framework that advances the state of the art along three key axes. First, we compare prefix-based Graph Convolutional Networks(GCNs) and full trace Graph Attention Networks(GATs) to quantify the performance gap between localized and global modeling. Second, we introduce a novel time decay attention mechanism that constructs dynamic, prediction-centered windows, emphasizing temporally relevant history and suppressing noise. Third, we embed transition type semantics into edge features to enable fine grained reasoning over structurally ambiguous traces. Our architecture includes multilevel interpretability modules, offering diverse visualizations of attention behavior. Evaluated on five benchmarks, the proposed models achieve competitive Top-k accuracy and DL scores without per-dataset tuning. By addressing architectural, temporal, and semantic gaps, this work presents a robust, generalizable, and explainable solution for next event prediction in PBPM.


【5】Causal Graph Profiling via Structural Divergence for Robust Anomaly Detection in Cyber-Physical Systems
标题:通过结构分歧进行因果图剖析,用于网络物理系统中的鲁棒异常检测
链接:https://arxiv.org/abs/2508.09504

作者:esh Malarkkan, Haoyue Bai, Dongjie Wang, Yanjie Fu
备注:7 Pages, 5 figures, Submission for ACM TKDD
摘要:随着针对水处理网络等关键基础设施的网络攻击日益复杂,迫切需要强大的异常检测策略,以解决系统漏洞和不断变化的攻击模式。传统的方法--统计的、基于密度的和基于图的模型--在多变量时间序列中与分布变化和类不平衡作斗争,通常导致高的假阳性率。为了应对这些挑战,我们提出了CGAD,这是一个基于因果图的异常检测框架,专为公共基础设施系统中的可靠网络攻击检测而设计。CGAD遵循两个阶段的监督框架-因果分析和异常评分。首先,它使用动态贝叶斯网络学习表示系统在“正常”和“攻击”状态下的行为的因果不变图结构。第二,它采用结构分歧检测异常,通过因果图比较评估拓扑偏差因果图随着时间的推移。通过利用因果结构,CGAD与传统的机器学习方法相比,在非平稳和不平衡的时间序列环境中实现了卓越的适应性和准确性。通过揭示不稳定的传感器数据下的因果结构,我们的框架不仅以显著更高的精度检测网络攻击,而且重新定义了异常检测的鲁棒性,证明了传统模型在不平衡和漂移下的弹性。我们的框架在四个工业数据集的F1和ROC-AUC分数上取得了实质性的进步,表现出对延迟和结构复杂异常的鲁棒检测。
摘要:With the growing complexity of cyberattacks targeting critical infrastructures such as water treatment networks, there is a pressing need for robust anomaly detection strategies that account for both system vulnerabilities and evolving attack patterns. Traditional methods -- statistical, density-based, and graph-based models struggle with distribution shifts and class imbalance in multivariate time series, often leading to high false positive rates. To address these challenges, we propose CGAD, a Causal Graph-based Anomaly Detection framework designed for reliable cyberattack detection in public infrastructure systems. CGAD follows a two-phase supervised framework -- causal profiling and anomaly scoring. First, it learns causal invariant graph structures representing the system's behavior under "Normal" and "Attack" states using Dynamic Bayesian Networks. Second, it employs structural divergence to detect anomalies via causal graph comparison by evaluating topological deviations in causal graphs over time. By leveraging causal structures, CGAD achieves superior adaptability and accuracy in non-stationary and imbalanced time series environments compared to conventional machine learning approaches. By uncovering causal structures beneath volatile sensor data, our framework not only detects cyberattacks with markedly higher precision but also redefines robustness in anomaly detection, proving resilience where traditional models falter under imbalance and drift. Our framework achieves substantial gains in F1 and ROC-AUC scores over best-performing baselines across four industrial datasets, demonstrating robust detection of delayed and structurally complex anomalies.


【6】Learn to Explore: Meta NAS via Bayesian Optimization Guided Graph Generation
标题:学习探索:通过贝叶斯优化引导的图形生成的Meta NAS
链接:https://arxiv.org/abs/2508.09467

作者:, Yanning Shen
摘要:神经架构搜索(NAS)自动化高性能神经网络的设计,但通常针对单个预定义的任务,从而限制了其现实世界的适用性。为了解决这个问题,Meta Neural Architecture Search(Meta-NAS)已经成为一种很有前途的范式,它利用跨任务的先验知识来快速适应新任务。然而,现有的Meta-NAS方法通常难以推广,搜索空间有限或计算成本高。在本文中,我们提出了一种新的元NAS框架,GraB-NAS。具体来说,GraB-NAS首先将神经架构建模为图,然后开发混合搜索策略来找到并生成新的图,从而产生有前途的神经架构。该搜索策略结合了全局结构搜索通过贝叶斯优化在搜索空间和局部探索新的神经网络通过梯度上升的潜在空间。这种混合搜索策略允许GraB-NAS发现具有强大性能的任务感知架构,甚至超出预定义的搜索空间。大量的实验表明,GraB-NAS优于最先进的Meta-NAS基线,实现了更好的泛化和搜索效率。
摘要:Neural Architecture Search (NAS) automates the design of high-performing neural networks but typically targets a single predefined task, thereby restricting its real-world applicability. To address this, Meta Neural Architecture Search (Meta-NAS) has emerged as a promising paradigm that leverages prior knowledge across tasks to enable rapid adaptation to new ones. Nevertheless, existing Meta-NAS methods often struggle with poor generalization, limited search spaces, or high computational costs. In this paper, we propose a novel Meta-NAS framework, GraB-NAS. Specifically, GraB-NAS first models neural architectures as graphs, and then a hybrid search strategy is developed to find and generate new graphs that lead to promising neural architectures. The search strategy combines global architecture search via Bayesian Optimization in the search space with local exploration for novel neural networks via gradient ascent in the latent space. Such a hybrid search strategy allows GraB-NAS to discover task-aware architectures with strong performance, even beyond the predefined search space. Extensive experiments demonstrate that GraB-NAS outperforms state-of-the-art Meta-NAS baselines, achieving better generalization and search effectiveness.


【7】Graph Neural Network and Transformer Integration for Unsupervised System Anomaly Discovery
标题:图神经网络和Transformer集成用于无监督系统异常发现
链接:https://arxiv.org/abs/2508.09401

作者:ing Gong, Zhihao Xue, Yujun Zou, Nia Qi, Yingnan Deng
摘要:本文提出了一种分布式后端服务系统的无监督异常检测方法,解决了复杂的结构依赖关系、多样的行为演化和缺乏标记数据等实际问题。该方法基于服务调用关系构建动态图,并应用图卷积从多跳拓扑中提取高阶结构表示。Transformer用于对每个节点的时间行为进行建模,捕获长期依赖性和局部波动。在特征融合阶段,可学习的联合嵌入机制将结构和行为表示集成到统一的异常向量中。然后应用非线性映射来计算异常分数,从而实现端到端检测过程而无需监督。对真实云监控数据的实验包括不同图形深度、序列长度和数据扰动的敏感性分析。实验结果表明,该方法在几个关键指标上优于现有模型,在捕获异常传播路径和动态行为序列建模方面表现出更强的表达能力和稳定性,具有较高的实际应用潜力。
摘要 :This study proposes an unsupervised anomaly detection method for distributed backend service systems, addressing practical challenges such as complex structural dependencies, diverse behavioral evolution, and the absence of labeled data. The method constructs a dynamic graph based on service invocation relationships and applies graph convolution to extract high-order structural representations from multi-hop topologies. A Transformer is used to model the temporal behavior of each node, capturing long-term dependencies and local fluctuations. During the feature fusion stage, a learnable joint embedding mechanism integrates structural and behavioral representations into a unified anomaly vector. A nonlinear mapping is then applied to compute anomaly scores, enabling an end-to-end detection process without supervision. Experiments on real-world cloud monitoring data include sensitivity analyses across different graph depths, sequence lengths, and data perturbations. Results show that the proposed method outperforms existing models on several key metrics, demonstrating stronger expressiveness and stability in capturing anomaly propagation paths and modeling dynamic behavior sequences, with high potential for practical deployment.


【8】RicciFlowRec: A Geometric Root Cause Recommender Using Ricci Curvature on Financial Graphs
标题:RicciFlowRec:在财务图上使用Ricci曲线的几何根本原因推荐器
链接:https://arxiv.org/abs/2508.09334

作者: Sun, Anoushka Harit
备注:Accepted at ACM RecSys 2025 (Late Breaking Results Track)
摘要:我们提出了RicciFlowRec,一个几何推荐框架,通过Ricci曲率和流动的动态金融图进行根本原因归因。通过对股票、宏观经济指标和新闻之间不断变化的相互作用进行建模,我们使用离散Ricci曲率和通过Ricci流跟踪冲击传播来量化局部压力。曲率梯度揭示因果子结构,通知结构风险感知排名功能。基于FinBERT情感的S\&P~500数据的初步结果表明,在合成扰动下,该方法具有更好的鲁棒性和可解释性。这项正在进行的工作支持基于曲率的归因和早期风险意识排名,并计划进行投资组合优化和回报预测。据我们所知,RicciFlowRec是第一个将基于几何流的推理应用于金融决策支持的推荐器。
摘要:We propose RicciFlowRec, a geometric recommendation framework that performs root cause attribution via Ricci curvature and flow on dynamic financial graphs. By modelling evolving interactions among stocks, macroeconomic indicators, and news, we quantify local stress using discrete Ricci curvature and trace shock propagation via Ricci flow. Curvature gradients reveal causal substructures, informing a structural risk-aware ranking function. Preliminary results on S\&P~500 data with FinBERT-based sentiment show improved robustness and interpretability under synthetic perturbations. This ongoing work supports curvature-based attribution and early-stage risk-aware ranking, with plans for portfolio optimization and return forecasting. To our knowledge, RicciFlowRec is the first recommender to apply geometric flow-based reasoning in financial decision support.


【9】Exact Verification of Graph Neural Networks with Incremental Constraint Solving
标题:增量约束求解的图神经网络精确验证
链接:https://arxiv.org/abs/2508.09320

作者:iu, Chia-Hsuan Lu, Marta Kwiatkowska
摘要:图神经网络(GNN)越来越多地用于高风险应用,如欺诈检测或医疗保健,但容易受到对抗性攻击。已经提出了许多技术来提供对抗性鲁棒性保证,但仍然缺乏对消息传递GNN中常用聚合函数的支持。在本文中,我们为GNNs开发了一种精确(可靠和完整)的验证方法,以计算对涉及边添加或删除的属性和结构扰动的保证,并受到预算约束。针对节点分类任务,该方法采用约束求解和约束收紧,迭代求解一系列放松约束满足问题,同时依靠求解器的增量求解能力提高效率。我们实现了GNNev,这是一个用于消息传递神经网络的通用求解器,它支持三个聚合函数:sum、max和mean,其中后两个是首次在这里考虑的。在两个标准基准测试(Cora和CiteSeer)和两个真实世界的欺诈数据集(Amazon和Yelp)上对GNNev进行了广泛的实验评估,证明了它的可用性和有效性,以及与现有的{精确验证}工具相比,在汇总节点分类任务上的优越性能。
摘要:Graph neural networks (GNNs) are increasingly employed in high-stakes applications, such as fraud detection or healthcare, but are susceptible to adversarial attacks. A number of techniques have been proposed to provide adversarial robustness guarantees, but support for commonly used aggregation functions in message-passing GNNs is still lacking. In this paper, we develop an exact (sound and complete) verification method for GNNs to compute guarantees against attribute and structural perturbations that involve edge addition or deletion, subject to budget constraints. Focusing on node classification tasks, our method employs constraint solving with bound tightening, and iteratively solves a sequence of relaxed constraint satisfaction problems while relying on incremental solving capabilities of solvers to improve efficiency. We implement GNNev, a versatile solver for message-passing neural networks, which supports three aggregation functions, sum, max and mean, with the latter two considered here for the first time. Extensive experimental evaluation of GNNev on two standard benchmarks (Cora and CiteSeer) and two real-world fraud datasets (Amazon and Yelp) demonstrates its usability and effectiveness, as well as superior performance compared to existing {exact verification} tools on sum-aggregated node classification tasks.


【10】Blockchain Network Analysis using Quantum Inspired Graph Neural Networks & Ensemble Models
标题:使用量子启发图神经网络和集合模型进行区块链网络分析
链接:https://arxiv.org/abs/2508.09237

作者:mico, Daniel De Rosso, Ninad Dixit, Raul Salles de Padua, Samuel Palmer, Samuel Mugel, Román Orús, Holger Eble, Ali Abedi
摘要:在快速发展的金融技术领域,检测区块链网络中的非法交易仍然是一个关键挑战,需要强大而创新的解决方案。这项工作提出了一种新的方法,通过结合量子启发图神经网络(QI-GNN)与使用QBoost或经典模型(如随机Forrest分类器)选择Enclusion模型的灵活性。该系统专为反洗钱(AML)工作中的区块链网络分析而量身定制。我们设计该系统的方法在图神经网络框架中集成了一个新颖的组件,即规范多元(CP)分解层,增强了其有效处理和分析复杂数据结构的能力。我们的技术方法经过了针对经典机器学习实现的严格评估,在检测欺诈交易方面达到了74.8%的F2分数。这些结果突出了量子启发技术的潜力,并辅以CP层的结构进步,不仅可以匹配,而且可能超过金融安全复杂网络分析的传统方法。研究结果倡导在金融领域更广泛地采用和进一步探索量子启发算法,以有效打击欺诈行为。
摘要:In the rapidly evolving domain of financial technology, the detection of illicit transactions within blockchain networks remains a critical challenge, necessitating robust and innovative solutions. This work proposes a novel approach by combining Quantum Inspired Graph Neural Networks (QI-GNN) with flexibility of choice of an Ensemble Model using QBoost or a classic model such as Random Forrest Classifier. This system is tailored specifically for blockchain network analysis in anti-money laundering (AML) efforts. Our methodology to design this system incorporates a novel component, a Canonical Polyadic (CP) decomposition layer within the graph neural network framework, enhancing its capability to process and analyze complex data structures efficiently. Our technical approach has undergone rigorous evaluation against classical machine learning implementations, achieving an F2 score of 74.8% in detecting fraudulent transactions. These results highlight the potential of quantum-inspired techniques, supplemented by the structural advancements of the CP layer, to not only match but potentially exceed traditional methods in complex network analysis for financial security. The findings advocate for a broader adoption and further exploration of quantum-inspired algorithms within the financial sector to effectively combat fraud.


【11】GSMT: Graph Fusion and Spatiotemporal TaskCorrection for Multi-Bus Trajectory Prediction
标题:GMT:多节点轨迹预测的图融合和时空任务修正
链接:https://arxiv.org/abs/2508.09227

作者: Hwa Hui Tew, Junn Yong Loo, Susilawati, LiTong Liu, Fang Yu Leong, Xuewen Luo, Kar Keong Chin, Jia Jun Gan
备注:This paper has been accepted by ITSC 2025
摘要:公交车的精确轨迹预测在智能交通系统中至关重要,特别是在城市环境中。在获得多式联运数据的机会有限的发展中区域,尽管存在固有的挑战,但完全依靠车载全球定位系统数据仍然是必不可少的。为了解决这个问题,我们提出了GSMT,这是一种混合模型,它将图注意力网络(GAT)与序列到序列的递归神经网络(RNN)集成在一起,并结合了一个能够从大规模轨迹数据中提取复杂行为模式的任务校正器。任务校正器对历史轨迹进行聚类,以识别不同的运动模式,并对GAT和RNN生成的预测进行微调。具体而言,GSMT通过嵌入式混合网络融合动态总线信息和静态站点信息来执行轨迹预测,并在生成初始预测后应用任务校正器进行二次精化。这种两阶段的方法可以在复杂条件下在密集的城市交通环境中运行的公共汽车之间进行多节点轨迹预测。在马来西亚吉隆坡的真实世界数据集上进行的实验表明,我们的方法显着优于现有方法,在短期和长期轨迹预测任务中都取得了优异的性能。
摘要:Accurate trajectory prediction for buses is crucial in intelligent transportation systems, particularly within urban environments. In developing regions where access to multimodal data is limited, relying solely on onboard GPS data remains indispensable despite inherent challenges. To address this problem, we propose GSMT, a hybrid model that integrates a Graph Attention Network (GAT) with a sequence-to-sequence Recurrent Neural Network (RNN), and incorporates a task corrector capable of extracting complex behavioral patterns from large-scale trajectory data. The task corrector clusters historical trajectories to identify distinct motion patterns and fine-tunes the predictions generated by the GAT and RNN. Specifically, GSMT fuses dynamic bus information and static station information through embedded hybrid networks to perform trajectory prediction, and applies the task corrector for secondary refinement after the initial predictions are generated. This two-stage approach enables multi-node trajectory prediction among buses operating in dense urban traffic environments under complex conditions. Experiments conducted on a real-world dataset from Kuala Lumpur, Malaysia, demonstrate that our method significantly outperforms existing approaches, achieving superior performance in both short-term and long-term trajectory prediction tasks.


【12】scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell Clustering
标题:scGC:通过单细胞聚集的对比指导学习自适应细胞图
链接:https://arxiv.org/abs/2508.09180

作者: Jie Fu, Xinlin Zhuang, Haolin Yang, Xinpeng Ling, Tong Cheng, Haochen xue, Imran Razzak, Zhili Chen
摘要:准确的细胞类型注释是分析单细胞RNA测序(scRNA-seq)数据的关键步骤,它为细胞异质性提供了有价值的见解。然而,由于scRNA-seq数据中零元素的高维性和普遍性,传统的聚类方法面临着显著的统计和计算挑战。虽然一些先进的方法使用图神经网络来建模细胞之间的关系,但它们通常依赖于对噪声敏感的静态图结构,并且无法捕获单细胞群体中固有的长尾分布。为了解决这些限制,我们提出了scAGC,一种单细胞聚类方法,该方法通过对比指导来学习自适应细胞图。我们的方法以端到端的方式同时优化特征表示和单元图。具体来说,我们引入了一个拓扑自适应图自动编码器,它利用可微分Gumbel-Softmax采样策略在训练过程中动态地优化图结构。这种自适应机制通过促进更平衡的邻域结构来减轻长尾度分布的问题。为了对scRNA-seq数据的离散、过度分散和零膨胀性质进行建模,我们整合了零膨胀负二项(ZINB)损失以进行稳健的特征重建。此外,一个对比学习目标被纳入到正规化的图形学习过程,并防止在图形拓扑结构的突然变化,确保稳定性和增强收敛。在9个真实scRNA-seq数据集上的综合实验表明,scAGC始终优于其他最先进的方法,分别在9个和7个数据集上获得最佳NMI和ARI分数。我们的代码可在匿名Github上获得。
摘要:Accurate cell type annotation is a crucial step in analyzing single-cell RNA sequencing (scRNA-seq) data, which provides valuable insights into cellular heterogeneity. However, due to the high dimensionality and prevalence of zero elements in scRNA-seq data, traditional clustering methods face significant statistical and computational challenges. While some advanced methods use graph neural networks to model cell-cell relationships, they often depend on static graph structures that are sensitive to noise and fail to capture the long-tailed distribution inherent in single-cell populations.To address these limitations, we propose scAGC, a single-cell clustering method that learns adaptive cell graphs with contrastive guidance. Our approach optimizes feature representations and cell graphs simultaneously in an end-to-end manner. Specifically, we introduce a topology-adaptive graph autoencoder that leverages a differentiable Gumbel-Softmax sampling strategy to dynamically refine the graph structure during training. This adaptive mechanism mitigates the problem of a long-tailed degree distribution by promoting a more balanced neighborhood structure. To model the discrete, over-dispersed, and zero-inflated nature of scRNA-seq data, we integrate a Zero-Inflated Negative Binomial (ZINB) loss for robust feature reconstruction. Furthermore, a contrastive learning objective is incorporated to regularize the graph learning process and prevent abrupt changes in the graph topology, ensuring stability and enhancing convergence. Comprehensive experiments on 9 real scRNA-seq datasets demonstrate that scAGC consistently outperforms other state-of-the-art methods, yielding the best NMI and ARI scores on 9 and 7 datasets, respectively.Our code is available at Anonymous Github.


【13】A pseudo-inverse of a line graph
标题:线图的伪逆
链接:https://arxiv.org/abs/2508.09412

作者:Kandanaarachchi, Philip Kilby, Cheng Soon Ong
摘要:线图是图形的另一种表示,其中原始(根)图形的每个顶点都成为边。然而,并非所有的图都有对应的根图,因此从图到线图的变换是不可逆的。我们调查的情况下,当有一个小扰动的线图的空间,并试图恢复相应的根图,基本上定义的线图操作的逆。我们提出了一个线性整数规划,编辑线图中的最小数量的边,允许根图被发现。我们使用谱范数从理论上证明了这样的伪逆操作是良好的行为。对Erd\H{o}s-R\'enyi图的实验表明,我们的理论结果在实际中是有效的.
摘要:Line graphs are an alternative representation of graphs where each vertex of the original (root) graph becomes an edge. However not all graphs have a corresponding root graph, hence the transformation from graphs to line graphs is not invertible. We investigate the case when there is a small perturbation in the space of line graphs, and try to recover the corresponding root graph, essentially defining the inverse of the line graph operation. We propose a linear integer program that edits the smallest number of edges in the line graph, that allow a root graph to be found. We use the spectral norm to theoretically prove that such a pseudo-inverse operation is well behaved. Illustrative empirical experiments on Erd\H{o}s-R\'enyi graphs show that our theoretical results work in practice.


Transformer(3篇)

【1】A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language Recognition
标题:一种用于连续手语识别的符号不变构象和多尺度融合Transformer
链接:https://arxiv.org/abs/2508.09372

作者:ul Haque, Md. Milon Islam, S M Taslim Uddin Raju, Fakhri Karray
备注:Accepted for the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, USA. 1st MSLR Workshop 2025
摘要:连续手语识别面临着多种挑战,包括显著的签名者间差异和对新句子结构的泛化能力差。传统的解决方案往往无法有效地处理这些问题。为了克服这些限制,我们提出了一个双架构框架。对于Signer-Independent(SI)挑战,我们提出了一个Signer-Invariant Conformer,它将卷积与多头自注意力相结合,从基于姿势的骨架关键点中学习鲁棒的签名不可知表示。对于Unseen-Sentences(US)任务,我们设计了一个多尺度融合Transformer,它具有一个新颖的双路径时间编码器,可以捕获细粒度的姿势动态,使模型能够理解新颖的语法成分。在具有挑战性的Isharah-1000数据集上进行的实验为两个CSLR基准建立了新的标准。所提出的一致性架构实现了字错误率(WER)的13.07%的SI的挑战,减少了13.53%,从国家的最先进的。在美国的任务,Transformer模型得分的WER为47.78%,超过了以前的工作。在SignEval 2025 CSLR挑战赛中,我们的团队在美国任务中排名第二,在SI任务中排名第四,展示了这些模型的性能。研究结果验证了我们的关键假设:开发针对CSLR特定挑战的特定任务网络会导致相当大的性能改善,并为进一步研究建立了新的基线。源代码可从以下网址获得:https://github.com/rezwanh001/MSLR-Pose86K-CSLR-Isharah。
摘要 :Continuous Sign Language Recognition (CSLR) faces multiple challenges, including significant inter-signer variability and poor generalization to novel sentence structures. Traditional solutions frequently fail to handle these issues efficiently. For overcoming these constraints, we propose a dual-architecture framework. For the Signer-Independent (SI) challenge, we propose a Signer-Invariant Conformer that combines convolutions with multi-head self-attention to learn robust, signer-agnostic representations from pose-based skeletal keypoints. For the Unseen-Sentences (US) task, we designed a Multi-Scale Fusion Transformer with a novel dual-path temporal encoder that captures both fine-grained posture dynamics, enabling the model's ability to comprehend novel grammatical compositions. Experiments on the challenging Isharah-1000 dataset establish a new standard for both CSLR benchmarks. The proposed conformer architecture achieves a Word Error Rate (WER) of 13.07% on the SI challenge, a reduction of 13.53% from the state-of-the-art. On the US task, the transformer model scores a WER of 47.78%, surpassing previous work. In the SignEval 2025 CSLR challenge, our team placed 2nd in the US task and 4th in the SI task, demonstrating the performance of these models. The findings validate our key hypothesis: that developing task-specific networks designed for the particular challenges of CSLR leads to considerable performance improvements and establishes a new baseline for further research. The source code is available at: https://github.com/rezwanh001/MSLR-Pose86K-CSLR-Isharah.


【2】To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA
标题:从理论上理解基于转换器的上下文学习以优化CMA
链接:https://arxiv.org/abs/2508.09146

作者:ao, Hongbo Li, Lingjie Duan
摘要:二进制指数退避方案在WiFi 7中被广泛使用,并且在动态信道环境下仍然导致较差的吞吐量性能。最近的基于模型的方法(例如,非持久和持久CSMA)在已知和固定的节点密度下简单地优化退避策略,由于不准确的节点密度估计,仍然导致大的吞吐量损失。本文首次提出了用于优化通道访问的基于LLM变换器的上下文学习(ICL)理论。我们设计了一个基于transformer的ICL优化器来预先收集冲突阈值数据示例和查询冲突案例。它们被构造为提示,作为Transformer学习模式的输入,然后生成预测的竞争窗口阈值(CWT)。为了训练有效ICL的Transformer,我们开发了一种有效的算法,并保证在有限的训练步骤内接近最佳的CWT预测。由于在实际应用中可能很难为ICL收集到完美的数据示例,我们进一步扩展以允许在提示符中输入错误的数据。我们证明,我们的优化器保持最小的预测和吞吐量偏离最佳值。NS-3上的实验结果进一步证明了我们的方法的快速收敛性和接近最优的吞吐量比现有的基于模型和基于DRL的方法在未知的节点密度。
摘要:The binary exponential backoff scheme is widely used in WiFi 7 and still incurs poor throughput performance under dynamic channel environments. Recent model-based approaches (e.g., non-persistent and $p$-persistent CSMA) simply optimize backoff strategies under a known and fixed node density, still leading to a large throughput loss due to inaccurate node density estimation. This paper is the first to propose LLM transformer-based in-context learning (ICL) theory for optimizing channel access. We design a transformer-based ICL optimizer to pre-collect collision-threshold data examples and a query collision case. They are constructed as a prompt as the input for the transformer to learn the pattern, which then generates a predicted contention window threshold (CWT). To train the transformer for effective ICL, we develop an efficient algorithm and guarantee a near-optimal CWT prediction within limited training steps. As it may be hard to gather perfect data examples for ICL in practice, we further extend to allow erroneous data input in the prompt. We prove that our optimizer maintains minimal prediction and throughput deviations from the optimal values. Experimental results on NS-3 further demonstrate our approach's fast convergence and near-optimal throughput over existing model-based and DRL-based approaches under unknown node densities.


【3】Efficient Real-Time Aircraft ETA Prediction via Feature Tokenization Transformer
标题:通过特征令牌化Transformer高效实时飞机埃塔预测
链接:https://arxiv.org/abs/2508.09144

作者:ang, Yicheng Zhang, Yifang Yin, Sheng Zhang, Yi Zhang
备注:9 pages, 9 figures, published in the confernce "US-Europe Air Transportation Research & Development Symposium 2025"
摘要:机载飞机的实时预计到达时间(ETA)对于航空中的到达管理,特别是对于跑道排序至关重要。在快速变化的空域环境下,在实时到达飞机管理系统中,ETA预测效率与其准确性一样重要。在这项研究中,我们利用基于特征标记的Transformer模型来有效地预测飞机的预计到达时间。特征标记化将原始输入投影到潜在空间,而Transformer中的多头自注意机制捕获投影的重要方面,从而减轻了对复杂特征工程的需求。此外,Transformer的并行计算能力允许其以高频率处理ETA请求,即,1HZ,这是实时到达管理系统所必需的。模型输入包括原始数据,例如飞机纬度、经度、地面速度、机场的θ度、来自轨道数据的日和小时、天气背景和飞机尾流湍流类别。在1HZ的数据采样率下,ETA预测每秒更新。我们使用2022年10月1日至10月31日为期一个月的广播式自动相关监视(ADS-B)数据,将拟议的飞机预计到达时间预测方法应用于新加坡樟宜机场(ICAO代码:WSSS)。在试验评估中,ETA建模覆盖了距离WSSS 10 NM至300 NM范围内的所有飞机。结果表明,该方法优于常用的基于boosting树的模型,与XGBoost相比,准确率提高了7%,而计算时间仅为XGBoost的39%.实验结果还表明,在给定时间戳的空域中有40架飞机,预计到达时间推断时间仅为51.7微秒,这使得它有希望用于实时到达管理系统。
摘要:Estimated time of arrival (ETA) for airborne aircraft in real-time is crucial for arrival management in aviation, particularly for runway sequencing. Given the rapidly changing airspace context, the ETA prediction efficiency is as important as its accuracy in a real-time arrival aircraft management system. In this study, we utilize a feature tokenization-based Transformer model to efficiently predict aircraft ETA. Feature tokenization projects raw inputs to latent spaces, while the multi-head self-attention mechanism in the Transformer captures important aspects of the projections, alleviating the need for complex feature engineering. Moreover, the Transformer's parallel computation capability allows it to handle ETA requests at a high frequency, i.e., 1HZ, which is essential for a real-time arrival management system. The model inputs include raw data, such as aircraft latitude, longitude, ground speed, theta degree for the airport, day and hour from track data, the weather context, and aircraft wake turbulence category. With a data sampling rate of 1HZ, the ETA prediction is updated every second. We apply the proposed aircraft ETA prediction approach to Singapore Changi Airport (ICAO Code: WSSS) using one-month Automatic Dependent Surveillance-Broadcast (ADS-B) data from October 1 to October 31, 2022. In the experimental evaluation, the ETA modeling covers all aircraft within a range of 10NM to 300NM from WSSS. The results show that our proposed method method outperforms the commonly used boosting tree based model, improving accuracy by 7\% compared to XGBoost, while requiring only 39\% of its computing time. Experimental results also indicate that, with 40 aircraft in the airspace at a given timestamp, the ETA inference time is only 51.7 microseconds, making it promising for real-time arrival management systems.


GAN|对抗|攻击|生成相关(13篇)

【1】Story2Board: A Training-Free Approach for Expressive Storyboard Generation
标题:Story2Board:一种针对表现力故事板一代的免训练方法
链接:https://arxiv.org/abs/2508.09983

作者:kevich, Matan Levy, Omri Avrahami, Dvir Samuel, Dani Lischinski
备注:Project page is available at this https URL
摘要:我们提出了Story2Board,一个用于从自然语言生成富有表现力的故事板的免训练框架。现有的方法狭隘地关注主体身份,忽视了视觉故事的关键方面,如空间组成,背景演变和叙事节奏。为了解决这个问题,我们引入了一个轻量级的一致性框架,由两个组件组成:潜在的面板匹配,它保留了跨面板的共享字符引用,以及交互注意力值混合,它将标记对之间的视觉特征与强大的交互注意力柔和地融合在一起。总之,这些机制增强了一致性,而无需架构更改或微调,使最先进的扩散模型能够生成视觉上多样化但一致的故事板。为了结构化生成,我们使用现成的语言模型将自由形式的故事转换为接地面板级提示。为了进行评估,我们提出了丰富的故事板基准,这是一套开放领域的叙事,旨在评估布局的多样性和背景故事,以及一致性。我们还引入了一个新的场景多样性指标,量化跨故事板的空间和姿态变化。我们的定性和定量结果以及用户研究表明,Story2Board比现有的基线产生更动态,连贯和naradically参与故事板。
摘要 :We present Story2Board, a training-free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing. To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring, which preserves a shared character reference across panels, and Reciprocal Attention Value Mixing, which softly blends visual features between token pairs with strong reciprocal attention. Together, these mechanisms enhance coherence without architectural changes or fine-tuning, enabling state-of-the-art diffusion models to generate visually diverse yet consistent storyboards. To structure generation, we use an off-the-shelf language model to convert free-form stories into grounded panel-level prompts. To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domain narratives designed to assess layout diversity and background-grounded storytelling, in addition to consistency. We also introduce a new Scene Diversity metric that quantifies spatial and pose variation across storyboards. Our qualitative and quantitative results, as well as a user study, show that Story2Board produces more dynamic, coherent, and narratively engaging storyboards than existing baselines.


【2】Generation of Indian Sign Language Letters, Numbers, and Words
标题:印度手语字母、数字和单词的一代
链接:https://arxiv.org/abs/2508.09522

作者:ar Yadav, Nishant Kumar, Rathna G N
备注:6 pages, 5 figures, 2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS)
摘要:手语包含手部动作、面部表情和身体姿势,是与听力障碍者沟通的重要媒介。一个训练有素的手语社区很容易沟通,但那些不懂手语的人面临着巨大的挑战。识别和生成是听力正常者和听力障碍者之间的基本交流方式。尽管在识别方面取得了进展,但手语的生成仍需探索。生成对抗网络(ProGAN)的渐进增长擅长生成高质量的图像,而自我注意生成对抗网络(SAGAN)则以中等分辨率生成功能丰富的图像。平衡分辨率和细节是手语图像生成的关键。我们正在开发一个生成对抗网络(GAN)变体,它结合了这两种模型,以生成功能丰富、高分辨率和类条件的手语图像。我们修改后的基于注意力的模型生成了印度手语字母,数字和单词的高质量图像,在初始得分(IS)和Fr\'echet初始距离(FID)方面优于传统的ProGAN,分别提高了3.2和30.12。此外,我们正在发布一个大型数据集,其中包含印度手语字母,数字和129个单词的高质量图像。
摘要:Sign language, which contains hand movements, facial expressions and bodily gestures, is a significant medium for communicating with hard-of-hearing people. A well-trained sign language community communicates easily, but those who don't know sign language face significant challenges. Recognition and generation are basic communication methods between hearing and hard-of-hearing individuals. Despite progress in recognition, sign language generation still needs to be explored. The Progressive Growing of Generative Adversarial Network (ProGAN) excels at producing high-quality images, while the Self-Attention Generative Adversarial Network (SAGAN) generates feature-rich images at medium resolutions. Balancing resolution and detail is crucial for sign language image generation. We are developing a Generative Adversarial Network (GAN) variant that combines both models to generate feature-rich, high-resolution, and class-conditional sign language images. Our modified Attention-based model generates high-quality images of Indian Sign Language letters, numbers, and words, outperforming the traditional ProGAN in Inception Score (IS) and Fr\'echet Inception Distance (FID), with improvements of 3.2 and 30.12, respectively. Additionally, we are publishing a large dataset incorporating high-quality images of Indian Sign Language alphabets, numbers, and 129 words.


【3】Understanding Dementia Speech Alignment with Diffusion-Based Image Generation
标题:通过基于扩散的图像生成了解痴呆症语音对齐
链接:https://arxiv.org/abs/2508.09385

作者:astasios Lepipas, Dominika Woszczyk, Yiying Guan, Soteris Demetriou
备注:Paper accepted at Interspeech 2025
摘要:文本到图像模型基于自然语言描述生成高度逼真的图像,数百万用户使用它们在线创建和共享图像。虽然期望这样的模型可以在相同的潜在空间中对齐输入文本和生成的图像,但是很少有人理解这种对齐在病理语音和生成的图像之间是否可能。在这项工作中,我们研究了这种模型的能力,使痴呆症相关的语音信息与生成的图像,并开发方法来解释这种对齐。令人惊讶的是,我们发现痴呆症检测是可能的,仅从生成的图像在ADReSS数据集上实现了75%的准确率。然后,我们利用可解释性方法来显示语言的哪些部分有助于检测。
摘要:Text-to-image models generate highly realistic images based on natural language descriptions and millions of users use them to create and share images online. While it is expected that such models can align input text and generated image in the same latent space little has been done to understand whether this alignment is possible between pathological speech and generated images. In this work, we examine the ability of such models to align dementia-related speech information with the generated images and develop methods to explain this alignment. Surprisingly, we found that dementia detection is possible from generated images alone achieving 75% accuracy on the ADReSS dataset. We then leverage explainability methods to show which parts of the language contribute to the detection.


【4】Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning
标题:针对多智能体强化学习的约束黑匣子攻击
链接:https://arxiv.org/abs/2508.09275

作者:am, Jamal Bentahar, Mustapha Hedabou
备注:Under review in TNNLS
摘要:协作多智能体强化学习(c-MARL)已经迅速发展,为现实世界的应用提供了最先进的算法,包括敏感领域。然而,其广泛采用的一个关键挑战是缺乏对其对抗性攻击的脆弱性的彻底调查。现有的工作主要集中在训练时间的攻击或不切实际的情况下,如访问策略权重或训练代理策略的能力。在本文中,我们调查新的漏洞更现实和约束条件下,假设对手只能收集和扰动部署代理的意见。我们还考虑了对手根本无法访问的情况。我们提出了简单而高效的算法,用于生成对抗性扰动,旨在使受害者代理感知环境的方式不一致。我们的方法在三个基准测试和22个环境中进行了经验验证,证明了其在不同算法和环境中的有效性。此外,我们证明了我们的算法是样本效率高的,只需要1,000个样本,而以前的方法需要数百万个样本。
摘要:Collaborative multi-agent reinforcement learning (c-MARL) has rapidly evolved, offering state-of-the-art algorithms for real-world applications, including sensitive domains. However, a key challenge to its widespread adoption is the lack of a thorough investigation into its vulnerabilities to adversarial attacks. Existing work predominantly focuses on training-time attacks or unrealistic scenarios, such as access to policy weights or the ability to train surrogate policies. In this paper, we investigate new vulnerabilities under more realistic and constrained conditions, assuming an adversary can only collect and perturb the observations of deployed agents. We also consider scenarios where the adversary has no access at all. We propose simple yet highly effective algorithms for generating adversarial perturbations designed to misalign how victim agents perceive their environment. Our approach is empirically validated on three benchmarks and 22 environments, demonstrating its effectiveness across diverse algorithms and environments. Furthermore, we show that our algorithm is sample-efficient, requiring only 1,000 samples compared to the millions needed by previous methods.


【5】GANime: Generating Anime and Manga Character Drawings from Sketches with Deep Learning
标题:GANIME:使用深度学习从草图生成动漫人物画
链接:https://arxiv.org/abs/2508.09207

作者:obert Yang
摘要:从草图生成全彩色绘图的过程是漫画和动漫行业中一个很大的,通常成本很高的瓶颈。在这项研究中,我们研究了动画角色及其草图之间的图像到图像翻译的多种模型,包括神经风格转移,C-GAN和CycleGAN。通过对它们进行定性和定量评估,我们发现C-GAN是最有效的模型,能够生成接近人类创建的高质量和高分辨率图像。
摘要 :The process of generating fully colorized drawings from sketches is a large, usually costly bottleneck in the manga and anime industry. In this study, we examine multiple models for image-to-image translation between anime characters and their sketches, including Neural Style Transfer, C-GAN, and CycleGAN. By assessing them qualitatively and quantitatively, we find that C-GAN is the most effective model that is able to produce high-quality and high-resolution images close to those created by humans.


【6】Multi-Objective Instruction-Aware Representation Learning in Procedural Content Generation RL
标题:程序内容生成RL中的多目标教学感知表示学习
链接:https://arxiv.org/abs/2508.09193

作者: Kim, In-Chang Baek, Seo-Young Lee, Geum-Hwan Hwang, Kyung-Joong Kim
备注:5 pages, 3 figures
摘要:生成建模的最新进展强调了自然语言作为控制内容生成的高度表达性和可访问形式的重要性。然而,现有的用于过程内容生成的指令强化学习(IPCGRL)方法通常难以利用文本输入的表达丰富性,特别是在复杂的多目标指令下,导致可控性有限。为了解决这个问题,我们提出了\textit{MIPCGRL},一个多目标表示学习方法的指示内容生成器,其中包括句子嵌入的条件。MIPCGRL通过结合多标签分类和多头回归网络来有效地训练多目标嵌入空间。实验结果表明,在多目标指令下,该方法的可控性提高了13.8%.处理复杂指令的能力使内容生成更具表现力和灵活性。
摘要:Recent advancements in generative modeling emphasize the importance of natural language as a highly expressive and accessible modality for controlling content generation. However, existing instructed reinforcement learning for procedural content generation (IPCGRL) method often struggle to leverage the expressive richness of textual input, especially under complex, multi-objective instructions, leading to limited controllability. To address this problem, we propose \textit{MIPCGRL}, a multi-objective representation learning method for instructed content generators, which incorporates sentence embeddings as conditions. MIPCGRL effectively trains a multi-objective embedding space by incorporating multi-label classification and multi-head regression networks. Experimental results show that the proposed method achieves up to a 13.8\% improvement in controllability with multi-objective instructions. The ability to process complex instructions enables more expressive and flexible content generation.


【7】Generating Feasible and Diverse Synthetic Populations Using Diffusion Models
标题:使用扩散模型生成可行且多样化的合成种群
链接:https://arxiv.org/abs/2508.09164

作者: Peng Lu, Qing Feng
摘要:人口合成是一项关键任务,涉及到生成合成但现实的人口表示。它是基于Agent的建模(ABM)中的一个基本问题,ABM已成为分析智能交通系统的标准。合成人口作为ABM运输模拟的主要输入,与人口成员代表的旅行代理。然而,当描述主体的属性数量变大时,由于维数灾难,调查数据往往不能密集地支持总体中属性的联合分布。这种稀疏性使得很难准确地建模和生成种群。有趣的是,从可用的样本数据训练的深度生成模型可以潜在地合成存在于实际群体中但不存在于样本数据中的可能属性组合(称为采样零)。然而,这是以错误地生成不存在于群体中的不可行属性组合(称为结构零)为代价的。在这项研究中,提出了一种新的基于扩散模型的人口合成方法来估计潜在的联合分布的人口。这种方法能够恢复大量缺失的采样零点,同时保持生成的结构零点最小。我们的方法相比,最近提出的其他方法,如变分自编码器(VAE)和生成对抗网络(GAN)的方法,这表明在高维表格人口合成的成功。我们使用一系列指标评估合成输出的性能,包括边际分布相似性、可行性和多样性。结果表明,我们提出的方法优于以前的方法,在实现合成人口的可行性和多样性之间的更好的平衡。
摘要:Population synthesis is a critical task that involves generating synthetic yet realistic representations of populations. It is a fundamental problem in agent-based modeling (ABM), which has become the standard to analyze intelligent transportation systems. The synthetic population serves as the primary input for ABM transportation simulation, with traveling agents represented by population members. However, when the number of attributes describing agents becomes large, survey data often cannot densely support the joint distribution of the attributes in the population due to the curse of dimensionality. This sparsity makes it difficult to accurately model and produce the population. Interestingly, deep generative models trained from available sample data can potentially synthesize possible attribute combinations that present in the actual population but do not exist in the sample data(called sampling zeros). Nevertheless, this comes at the cost of falsely generating the infeasible attribute combinations that do not exist in the population (called structural zeros). In this study, a novel diffusion model-based population synthesis method is proposed to estimate the underlying joint distribution of a population. This approach enables the recovery of numerous missing sampling zeros while keeping the generated structural zeros minimal. Our method is compared with other recently proposed approaches such as Variational Autoencoders (VAE) and Generative Adversarial Network (GAN) approaches, which have shown success in high dimensional tabular population synthesis. We assess the performance of the synthesized outputs using a range of metrics, including marginal distribution similarity, feasibility, and diversity. The results demonstrate that our proposed method outperforms previous approaches in achieving a better balance between the feasibility and diversity of the synthesized population.


【8】An Unsupervised Deep XAI Framework for Localization of Concurrent Replay Attacks in Nuclear Reactor Signals
标题:用于核反应堆信号中并发重放攻击定位的无监督深度XAI框架
链接:https://arxiv.org/abs/2508.09162

作者:nos Vasili, Zachery T. Dahm, William Richards, Stylianos Chatzidakis
摘要:下一代先进的核反应堆预计将在尺寸和功率输出上都更小,广泛依赖于全数字化的仪表和控制系统。这些反应器将以多变量时间序列数据的形式生成大量信息流,同时传达各种非线性网络物理、过程、控制、传感器和操作状态。确保数据完整性免受欺骗攻击对于网络通信变得越来越重要,并且是安全可靠操作的要求。目前解决重放攻击的努力几乎普遍集中在水印或监督异常检测方法上,而没有进一步识别和表征异常的根本原因。此外,这些方法主要依赖于具有不相关的高斯过程和测量噪声以及全状态反馈的合成数据,或者局限于单变量信号、信号平稳性、线性二次调节器或其他线性时不变状态空间,这可能无法捕获任何未建模的系统动态。在受监管的核网络物理系统领域,需要在重播攻击的特征描述和使用真实数据的预测的可解释性方面进行额外的工作。在这里,我们提出了一个基于自动编码器和定制的windowSHAP算法组合的无监督可解释AI框架,以充分表征实时重放攻击,即,检测、源识别、定时和类型,在动态时间演化反应器过程期间增加复杂性。拟议的XAI框架是在普渡大学核反应堆PUR-1的几个真实世界数据集上进行基准测试的,最多可以同时重放六个信号。在所有情况下,XAI框架都能够以95%或更高的准确率检测和识别正在重放的信号的来源和数量以及伪造的持续时间。
摘要 :Next generation advanced nuclear reactors are expected to be smaller both in size and power output, relying extensively on fully digital instrumentation and control systems. These reactors will generate a large flow of information in the form of multivariate time series data, conveying simultaneously various non linear cyber physical, process, control, sensor, and operational states. Ensuring data integrity against deception attacks is becoming increasingly important for networked communication and a requirement for safe and reliable operation. Current efforts to address replay attacks, almost universally focus on watermarking or supervised anomaly detection approaches without further identifying and characterizing the root cause of the anomaly. In addition, these approaches rely mostly on synthetic data with uncorrelated Gaussian process and measurement noise and full state feedback or are limited to univariate signals, signal stationarity, linear quadratic regulators, or other linear-time invariant state-space which may fail to capture any unmodeled system dynamics. In the realm of regulated nuclear cyber-physical systems, additional work is needed on characterization of replay attacks and explainability of predictions using real data. Here, we propose an unsupervised explainable AI framework based on a combination of autoencoder and customized windowSHAP algorithm to fully characterize real-time replay attacks, i.e., detection, source identification, timing and type, of increasing complexity during a dynamic time evolving reactor process. The proposed XAI framework was benchmarked on several real world datasets from Purdue's nuclear reactor PUR-1 with up to six signals concurrently being replayed. In all cases, the XAI framework was able to detect and identify the source and number of signals being replayed and the duration of the falsification with 95 percent or better accuracy.


【9】EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving
标题:EvaDrive:端到端自动驾驶的进化对抗政策优化
链接:https://arxiv.org/abs/2508.09158

作者:o, Kangan Qian, Hao Ye, Yang Zhong, Ziang Luo, Sicong Jiang, Zilin Huang, Yangyi Fang, Jinyu Miao, Zheng Fu, Yunlong Wang, Kun Jiang, Diange Yang, Rui Fan, Baoyun Peng
摘要:自动驾驶在实现类似人类的迭代决策方面面临着重大挑战,这种决策不断生成,评估和改进轨迹建议。当前的生成评估框架将轨迹生成与质量评估隔离开来,防止了规划所必需的迭代细化,而强化学习方法将多维偏好压缩为标量奖励,模糊了关键权衡并产生标量化偏差。为了克服这些问题,我们提出了EvaDrive,一种新的多目标强化学习框架,通过对抗优化在轨迹生成和评估之间建立真正的闭环协同进化。EvaDrive将轨迹规划框架为多轮对抗游戏。在这个游戏中,分层生成器通过将时间因果关系的自回归意图建模与空间灵活性的基于扩散的细化相结合,不断提出候选路径。然后,这些建议由可训练的多目标评论器进行严格评估,该评论器显式地保留了不同的偏好结构,而不会将它们折叠成单一的标量化偏差。这种对抗性的相互作用,在Pareto前沿选择机制的指导下,实现了迭代的多轮细化,有效地避免了局部最优,同时保持了轨迹的多样性。在NAVSIM v1上获得94.9 PDMS(超过DiffusionDrive 6.8,DriveSustainable 5.0,TrajHF 0.9),在Bench2Drive上获得64.96驾驶分数。EvaDrive通过动态加权生成不同的驾驶风格,而无需外部偏好数据,为类似人类的迭代决策引入了闭环对抗框架,提供了一种新颖的无标度化轨迹优化方法。
摘要:Autonomous driving faces significant challenges in achieving human-like iterative decision-making, which continuously generates, evaluates, and refines trajectory proposals. Current generation-evaluation frameworks isolate trajectory generation from quality assessment, preventing iterative refinement essential for planning, while reinforcement learning methods collapse multi-dimensional preferences into scalar rewards, obscuring critical trade-offs and yielding scalarization bias.To overcome these issues, we present EvaDrive, a novel multi-objective reinforcement learning framework that establishes genuine closed-loop co-evolution between trajectory generation and evaluation via adversarial optimization. EvaDrive frames trajectory planning as a multi-round adversarial game. In this game, a hierarchical generator continuously proposes candidate paths by combining autoregressive intent modeling for temporal causality with diffusion-based refinement for spatial flexibility. These proposals are then rigorously assessed by a trainable multi-objective critic that explicitly preserves diverse preference structures without collapsing them into a single scalarization bias.This adversarial interplay, guided by a Pareto frontier selection mechanism, enables iterative multi-round refinement, effectively escaping local optima while preserving trajectory diversity.Extensive experiments on NAVSIM and Bench2Drive benchmarks demonstrate SOTA performance, achieving 94.9 PDMS on NAVSIM v1 (surpassing DiffusionDrive by 6.8, DriveSuprim by 5.0, and TrajHF by 0.9) and 64.96 Driving Score on Bench2Drive. EvaDrive generates diverse driving styles via dynamic weighting without external preference data, introducing a closed-loop adversarial framework for human-like iterative decision-making, offering a novel scalarization-free trajectory optimization approach.


【10】Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
标题:生成和逆问题的流匹配模型的物理约束微调
链接:https://arxiv.org/abs/2508.09156

作者:rschmidt, Sophie Fellenz, Sebastian J. Vollmer, Andrew B. Duncan
备注:7 pages main content, 10 pages appendices
摘要:我们提出了一个微调流匹配生成模型的框架,以执行物理约束和解决科学系统中的逆问题。从一个在低保真度或观测数据上训练的模型开始,我们应用了一个可微的后训练过程,该过程最大限度地减少了控制偏微分方程(PDE)的弱形式残差,从而提高了物理一致性和对边界条件的遵守,而不会扭曲底层的学习分布。为了推断未知的物理输入,如源项,材料参数或边界数据,我们增加了生成过程与可学习的潜在参数预测,并提出了一个联合优化策略。由此产生的模型产生物理上有效的现场解决方案,以及合理的估计隐藏的参数,有效地解决不适定的反问题的数据驱动,但physisware的方式。我们验证了我们的方法规范PDE基准,表现出更好的满足PDE约束和准确的恢复潜系数。我们的方法连接生成建模和科学推理,为物理系统的模拟增强发现和数据高效建模开辟了新的途径。
摘要:We present a framework for fine-tuning flow-matching generative models to enforce physical constraints and solve inverse problems in scientific systems. Starting from a model trained on low-fidelity or observational data, we apply a differentiable post-training procedure that minimizes weak-form residuals of governing partial differential equations (PDEs), promoting physical consistency and adherence to boundary conditions without distorting the underlying learned distribution. To infer unknown physical inputs, such as source terms, material parameters, or boundary data, we augment the generative process with a learnable latent parameter predictor and propose a joint optimization strategy. The resulting model produces physically valid field solutions alongside plausible estimates of hidden parameters, effectively addressing ill-posed inverse problems in a data-driven yet physicsaware manner. We validate our method on canonical PDE benchmarks, demonstrating improved satisfaction of PDE constraints and accurate recovery of latent coefficients. Our approach bridges generative modelling and scientific inference, opening new avenues for simulation-augmented discovery and data-efficient modelling of physical systems.


【11】On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators
标题:具有纯状态发生器的量子生成对抗网络的推广极限
链接:https://arxiv.org/abs/2508.09844

作者:katovic, Akash Malemath, Ivan Kankeu, Yannick Werner, Matthias Tschöpe, Vitor Fortes Rey, Sungho Suh, Paul Lukowicz, Nikolaos Palaiodimopoulos, Maximilian Kiefer-Emmanouilidis
备注:16 pages, 5 figures
摘要:我们研究了量子生成对抗网络(QGAN)在图像生成任务中的能力。我们的分析集中在生成器和子系统的完全量子实现上。通过对当前主要架构的广泛数值测试,我们发现QGAN很难在数据集之间进行泛化,仅仅收敛于训练数据的平均表示。当生成器的输出是纯态时,我们解析地推导出由生成器的纯态输出与目标数据分布之间的保真度所给出的非线性质量的下限,从而为当前模型中观察到的局限性提供了理论解释。我们的研究结果揭示了现有量子生成模型泛化能力面临的根本挑战。虽然我们的分析重点是QGAN,但结果对相关量子生成模型的性能具有更广泛的影响。
摘要:We investigate the capabilities of Quantum Generative Adversarial Networks (QGANs) in image generations tasks. Our analysis centers on fully quantum implementations of both the generator and discriminator. Through extensive numerical testing of current main architectures, we find that QGANs struggle to generalize across datasets, converging on merely the average representation of the training data. When the output of the generator is a pure-state, we analytically derive a lower bound for the discriminator quality given by the fidelity between the pure-state output of the generator and the target data distribution, thereby providing a theoretical explanation for the limitations observed in current models. Our findings reveal fundamental challenges in the generalization capabilities of existing quantum generative models. While our analysis focuses on QGANs, the results carry broader implications for the performance of related quantum generative models.


【12】Improving the Speaker Anonymization Evaluation's Robustness to Target Speakers with Adversarial Learning
标题:利用对抗学习提高说话人识别评价对目标说话人的鲁棒性
链接:https://arxiv.org/abs/2508.09803

作者:anzreb, Arnab Das, Tim Polzehl, Sebastian Möller
摘要:目前的隐私评估扬声器匿名往往高估隐私时,使用同性目标选择算法(TSA),虽然TSA泄漏扬声器的性别,因此应该更容易受到攻击。我们假设发生这种情况是因为评估没有考虑到匿名语音包含来自源和目标说话者的信息这一事实。为了解决这个问题,我们建议添加一个目标分类器,它可以测量目标说话者信息在评估中的影响,这也可以通过对抗学习来消除。实验表明,这种方法对于多个匿名者是有效的,特别是当使用同性TSA时,可以进行更可靠的评估。
摘要:The current privacy evaluation for speaker anonymization often overestimates privacy when a same-gender target selection algorithm (TSA) is used, although this TSA leaks the speaker's gender and should hence be more vulnerable. We hypothesize that this occurs because the evaluation does not account for the fact that anonymized speech contains information from both the source and target speakers. To address this, we propose to add a target classifier that measures the influence of target speaker information in the evaluation, which can also be removed with adversarial learning. Experiments demonstrate that this approach is effective for multiple anonymizers, particularly when using a same-gender TSA, leading to a more reliable assessment.


【13】Quantum-Enhanced Generative Adversarial Networks: Comparative Analysis of Classical and Hybrid Quantum-Classical Generative Adversarial Networks
标题:量子增强生成对抗网络:经典和混合量子经典生成对抗网络的比较分析
链接:https://arxiv.org/abs/2508.09209

作者:Goh
备注:9 pages, 9 figures, 3 tables
摘要:生成对抗网络(GANs)已经成为产生高保真数据样本的强大范例,但它们的性能受到潜在表示(通常从经典噪声分布中采样)质量的限制。本研究探讨混合量子经典GANs(HQCGANs),其中量子发生器,通过参数化的量子电路实现,产生一个经典的GAN的潜在向量。我们评估了一个经典的GAN以及三个具有3,5和7个量子比特的HQCGAN变体,使用Qiskit的AerSimulator与现实的噪声模型来模拟近期的量子设备。二进制MNIST数据集(数字0和1)用于与当前量子硬件施加的低维潜在空间对齐。模型经过150个epoch的训练,并使用Frechet Inception Distance(FID)和Kernel Inception Distance(KID)进行评估。结果表明,虽然经典的GAN取得了最好的成绩,但7-qubit HQCGAN产生了竞争性的性能,缩小了后期的差距,而3-qubit模型表现出了早期的收敛限制。效率分析表明,尽管有量子采样开销,但只有适度的训练时间增加。这些发现验证了噪声量子电路作为GAN架构中潜在先验的可行性,突出了它们在噪声中间尺度量子(NISQ)时代的约束下增强生成建模的潜力。
摘要:Generative adversarial networks (GANs) have emerged as a powerful paradigm for producing high-fidelity data samples, yet their performance is constrained by the quality of latent representations, typically sampled from classical noise distributions. This study investigates hybrid quantum-classical GANs (HQCGANs) in which a quantum generator, implemented via parameterised quantum circuits, produces latent vectors for a classical discriminator. We evaluate a classical GAN alongside three HQCGAN variants with 3, 5, and 7 qubits, using Qiskit's AerSimulator with realistic noise models to emulate near-term quantum devices. The binary MNIST dataset (digits 0 and 1) is used to align with the low-dimensional latent spaces imposed by current quantum hardware. Models are trained for 150 epochs and assessed with Frechet Inception Distance (FID) and Kernel Inception Distance (KID). Results show that while the classical GAN achieved the best scores, the 7-qubit HQCGAN produced competitive performance, narrowing the gap in later epochs, whereas the 3-qubit model exhibited earlier convergence limitations. Efficiency analysis indicates only moderate training time increases despite quantum sampling overhead. These findings validate the feasibility of noisy quantum circuits as latent priors in GAN architectures, highlighting their potential to enhance generative modelling within the constraints of the noisy intermediate-scale quantum (NISQ) era.


半/弱/无/有监督|不确定性|主动学习(2篇)

【1】Social-Sensor Identity Cloning Detection Using Weakly Supervised Deep Forest and Cryptographic Authentication
标题:基于弱监督深度森林和密码认证的社交传感器身份克隆检测
链接:https://arxiv.org/abs/2508.09665

作者:arbi, Hai Dong, Xun Yi
备注:23 pages
摘要:近年来,社交传感器云身份克隆事件呈上升趋势。然而,现有的方法受到性能不令人满意,缺乏用于检测重复账户的解决方案,以及缺乏对真实世界数据集的大规模评估。我们介绍了一种新的方法来检测身份克隆的社会传感器云服务提供商。我们提出的技术包括两个主要组成部分:1)一个类似的身份检测方法和2)一个基于密码的认证协议。最初,我们开发了一个弱监督的深度森林模型,使用该服务提供的非隐私敏感的用户配置文件特征来识别相似的身份。随后,我们设计了一个基于密码的身份验证协议,以验证是否相似的身份是由同一个提供商。我们在一个大型的真实世界数据集上进行了大量的实验,证明了我们的技术与当前最先进的身份克隆检测方法相比的可行性和优越性能。
摘要:Recent years have witnessed a rising trend in social-sensor cloud identity cloning incidents. However, existing approaches suffer from unsatisfactory performance, a lack of solutions for detecting duplicated accounts, and a lack of large-scale evaluations on real-world datasets. We introduce a novel method for detecting identity cloning in social-sensor cloud service providers. Our proposed technique consists of two primary components: 1) a similar identity detection method and 2) a cryptography-based authentication protocol. Initially, we developed a weakly supervised deep forest model to identify similar identities using non-privacy-sensitive user profile features provided by the service. Subsequently, we designed a cryptography-based authentication protocol to verify whether similar identities were generated by the same provider. Our extensive experiments on a large real-world dataset demonstrate the feasibility and superior performance of our technique compared to current state-of-the-art identity clone detection methods.


【2】FIVA: Federated Inverse Variance Averaging for Universal CT Segmentation with Uncertainty Estimation
标题:FIVA:具有不确定性估计的通用CT分割的联合逆方差平均
链接:https://arxiv.org/abs/2508.09196

作者:e, Numan Saeed, Karthik Nandakumar
备注:17 pages, 5 figures, Machine Learning for Healthcare Conference
摘要:不同的CT分割数据集通常是在不同的捕获设置下从不同的扫描仪获得的,并且通常为有限的且通常不相交的器官集提供分割标签。有效使用这些异构数据,同时保护患者隐私可能具有挑战性。这项工作提出了一种新的联邦学习方法,通过利用模型不确定性进行聚合和预测不确定性进行推理,实现对不同腹部CT数据集的通用分割。我们的方法利用随机小批量梯度下降中的固有噪声来估计模型权重的分布,以在客户端级别提供模型参数的不确定性。然后,在服务器处使用贝叶斯启发的逆方差聚合方案使用附加的不确定性信息来聚合参数。此外,该方法通过传播模型权重的不确定性来量化预测的不确定性,为临床决策提供必要的置信度。在最近的工作表明,预测的不确定性是利用在推理阶段,以提高预测性能。实验评估表明,这种方法的有效性,提高联邦聚合和不确定性加权推理的质量相比,以前建立的基线。该工作的代码可在https://github.com/asimukaye/fiva上获得
摘要 :Different CT segmentation datasets are typically obtained from different scanners under different capture settings and often provide segmentation labels for a limited and often disjoint set of organs. Using these heterogeneous data effectively while preserving patient privacy can be challenging. This work presents a novel federated learning approach to achieve universal segmentation across diverse abdominal CT datasets by utilizing model uncertainty for aggregation and predictive uncertainty for inference. Our approach leverages the inherent noise in stochastic mini-batch gradient descent to estimate a distribution over the model weights to provide an on-the-go uncertainty over the model parameters at the client level. The parameters are then aggregated at the server using the additional uncertainty information using a Bayesian-inspired inverse-variance aggregation scheme. Furthermore, the proposed method quantifies prediction uncertainty by propagating the uncertainty from the model weights, providing confidence measures essential for clinical decision-making. In line with recent work shown, predictive uncertainty is utilized in the inference stage to improve predictive performance. Experimental evaluations demonstrate the effectiveness of this approach in improving both the quality of federated aggregation and uncertainty-weighted inference compared to previously established baselines. The code for this work is made available at: https://github.com/asimukaye/fiva


迁移|Zero/Few/One-Shot|自适应(6篇)

【1】HKT: A Biologically Inspired Framework for Modular Hereditary Knowledge Transfer in Neural Networks
标题:HKT:神经网络模块化遗传知识转移的生物启发框架
链接:https://arxiv.org/abs/2508.09743

作者:istian Tchenko, Felix Mohr, Hicham Hadj Abdelkader, Hedi Tabia
摘要:神经网络研究中的一个流行趋势表明,模型性能随着深度和容量的增加而提高-通常以可集成性和效率为代价。在本文中,我们提出了一种策略,以优化小型,可部署的模型,通过结构化的知识继承,提高他们的能力。我们介绍遗传知识转移(HKT),这是一个生物启发的框架,用于将任务相关特征从一个更大的、预训练的父网络模块化和选择性地转移到一个更小的子模型。与标准的知识蒸馏不同,它强制统一模仿教师的输出,HKT从生物遗传机制中汲取灵感-例如Planarians中的记忆RNA转移-以指导多阶段的特征转移过程。神经网络块被视为功能载体,知识通过三个生物动机组件传输:提取,转移和混合(ETM)。一种新的遗传注意力(GA)机制管理遗传和本地表示的整合,确保对齐和选择性。我们在不同的视觉任务中评估HKT,包括光流(Sintel,KITTI),图像分类(CIFAR-10)和语义分割(LiTS),证明它显着提高了儿童模型的性能,同时保持其紧凑性。结果表明,HKT始终优于传统的蒸馏方法,为在资源受限的环境中部署高性能神经网络提供了一种通用、可解释和可扩展的解决方案。
摘要:A prevailing trend in neural network research suggests that model performance improves with increasing depth and capacity - often at the cost of integrability and efficiency. In this paper, we propose a strategy to optimize small, deployable models by enhancing their capabilities through structured knowledge inheritance. We introduce Hereditary Knowledge Transfer (HKT), a biologically inspired framework for modular and selective transfer of task-relevant features from a larger, pretrained parent network to a smaller child model. Unlike standard knowledge distillation, which enforces uniform imitation of teacher outputs, HKT draws inspiration from biological inheritance mechanisms - such as memory RNA transfer in planarians - to guide a multi-stage process of feature transfer. Neural network blocks are treated as functional carriers, and knowledge is transmitted through three biologically motivated components: Extraction, Transfer, and Mixture (ETM). A novel Genetic Attention (GA) mechanism governs the integration of inherited and native representations, ensuring both alignment and selectivity. We evaluate HKT across diverse vision tasks, including optical flow (Sintel, KITTI), image classification (CIFAR-10), and semantic segmentation (LiTS), demonstrating that it significantly improves child model performance while preserving its compactness. The results show that HKT consistently outperforms conventional distillation approaches, offering a general-purpose, interpretable, and scalable solution for deploying high-performance neural networks in resource-constrained environments.


【2】Value Function Initialization for Knowledge Transfer and Jump-start in Deep Reinforcement Learning
标题:深度强化学习中知识转移和启动的价值函数预设
链接:https://arxiv.org/abs/2508.09277

作者:himeh
摘要:值函数初始化(VFI)是通过利用先前任务的值估计来实现强化学习(RL)快速启动的有效方法。虽然这种方法在表格设置中得到了很好的建立,但由于状态-动作空间的连续性,神经网络的噪声近似以及存储所有过去模型以供重用的不切实际性,将其扩展到深度强化学习(DRL)带来了挑战。在这项工作中,我们解决这些挑战,并介绍DQInit,一种方法,适应值函数初始化DRL。DQInit重用从以前解决的任务中提取的紧凑表格Q值作为可转移的知识库。它采用了一种基于知识的机制,将这些转移的值柔和地整合到未充分探索的区域中,并逐渐转向智能体的学习估计,避免了固定时间衰减的限制。我们的方法提供了一个新的角度在DRL的知识转移仅仅依靠价值估计,而不是政策或示范,有效地结合了快速启动RL和政策蒸馏的优势,同时减轻其缺点。多个连续控制任务的实验表明,与标准初始化和现有传输技术相比,DQInit始终提高了早期学习效率,稳定性和整体性能。
摘要:Value function initialization (VFI) is an effective way to achieve a jumpstart in reinforcement learning (RL) by leveraging value estimates from prior tasks. While this approach is well established in tabular settings, extending it to deep reinforcement learning (DRL) poses challenges due to the continuous nature of the state-action space, the noisy approximations of neural networks, and the impracticality of storing all past models for reuse. In this work, we address these challenges and introduce DQInit, a method that adapts value function initialization to DRL. DQInit reuses compact tabular Q-values extracted from previously solved tasks as a transferable knowledge base. It employs a knownness-based mechanism to softly integrate these transferred values into underexplored regions and gradually shift toward the agent's learned estimates, avoiding the limitations of fixed time decay. Our approach offers a novel perspective on knowledge transfer in DRL by relying solely on value estimates rather than policies or demonstrations, effectively combining the strengths of jumpstart RL and policy distillation while mitigating their drawbacks. Experiments across multiple continuous control tasks demonstrate that DQInit consistently improves early learning efficiency, stability, and overall performance compared to standard initialization and existing transfer techniques.


【3】Harnessing Input-Adaptive Inference for Efficient VLN
标题:利用输入自适应推理实现高效VLN
链接:https://arxiv.org/abs/2508.09262

作者:ang, Akhil Perincherry, Zachary Coalson, Aiden Gabriel, Stefan Lee, Sanghyun Hong
备注:Accepted to ICCV 2025 [Poster]
摘要:视觉和语言导航(VLN)中的一个新兴范例是使用历史感知的多模态Transformer模型。给定一个语言指令,这些模型处理观察和导航历史,以预测智能体的最适当动作。虽然它们显著提高了性能,但这些模型的规模在计算资源有限的实际环境中可能是一个瓶颈。在这项工作中,我们提出了一种新的输入自适应导航方法,以提高VLN模型的效率。我们首先表明,现有的输入自适应机制未能减少计算,而没有实质性的性能下降。为了解决这个问题,我们引入了三个自适应算法,每个部署在不同的水平:(1)为了提高空间效率,我们有选择地处理全景视图在每个观察的代理。(2)为了提高模型内的效率,我们提出了基于重要性的自适应阈值的早期退出方法。(3)为了提高时间效率,我们实现了一个缓存机制,防止再处理以前看到的代理的意见。在七个VLN基准的评估中,我们证明了在标准和连续环境中,三个现成的代理的计算减少了2$\times$。我们的代码可在https://github.com/secure-ai-systems-group/adaptive-vision-and-language-navigation上公开获取。
摘要:An emerging paradigm in vision-and-language navigation (VLN) is the use of history-aware multi-modal transformer models. Given a language instruction, these models process observation and navigation history to predict the most appropriate action for an agent. While they have significantly improved performance, the scale of these models can be a bottleneck in practical settings with limited computational resources. In this work, we propose a novel input-adaptive navigation method to enhance VLN model efficiency. We first show that existing input-adaptive mechanisms fail to reduce computations without substantial performance degradation. To address this, we introduce three adaptive algorithms, each deployed at a different level: (1) To improve spatial efficiency, we selectively process panoramic views at each observation of an agent. (2) To improve intra-model efficiency, we propose importance-based adaptive thresholding for the early-exit methods. (3) To improve temporal efficiency, we implement a caching mechanism that prevents reprocessing of views previously seen by the agent. In evaluations on seven VLN benchmarks, we demonstrate over a 2$\times$ reduction in computation across three off-the-shelf agents in both standard and continuous environments. Our code is publicly available at https://github.com/secure-ai-systems-group/adaptive-vision-and-language-navigation.


【4】Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation
标题 :具有用于测试时自适应的任务载体的分层自适应网络
链接:https://arxiv.org/abs/2508.09223

作者:bekar, Daniel M. Lang, Julia A. Schnabel
摘要:测试时自适应允许预训练的模型适应传入的数据流,解决源域和目标域之间的分布变化。然而,标准方法依赖于一维线性分类层,这往往无法处理多样化和复杂的变化。我们提出了具有任务向量的分层自适应网络(Hi-Vec),它利用多个层的不断增加的大小进行动态测试时的适应。通过将编码器的表示空间分解成这样的分层组织层,Hi-Vec以即插即用的方式允许现有方法适应不同复杂度的移位。我们的贡献有三个方面:首先,我们提出了动态层选择自动识别的最佳层,以适应每个测试批次。其次,我们提出了一种机制,将权重从动态层合并到其他层,确保所有层都接收目标信息。第三,我们提出了线性层协议,作为一个门控函数,防止错误的微调,适应嘈杂的批次。我们严格评估了Hi-Vec在具有挑战性的场景和多个目标数据集上的性能,证明了其推进最先进方法的强大能力。我们的研究结果表明,Hi-Vec提高了鲁棒性,解决了不确定性,并处理了有限的批量大小和增加的离群值率。
摘要:Test-time adaptation allows pretrained models to adjust to incoming data streams, addressing distribution shifts between source and target domains. However, standard methods rely on single-dimensional linear classification layers, which often fail to handle diverse and complex shifts. We propose Hierarchical Adaptive Networks with Task Vectors (Hi-Vec), which leverages multiple layers of increasing size for dynamic test-time adaptation. By decomposing the encoder's representation space into such hierarchically organized layers, Hi-Vec, in a plug-and-play manner, allows existing methods to adapt to shifts of varying complexity. Our contributions are threefold: First, we propose dynamic layer selection for automatic identification of the optimal layer for adaptation to each test batch. Second, we propose a mechanism that merges weights from the dynamic layer to other layers, ensuring all layers receive target information. Third, we propose linear layer agreement that acts as a gating function, preventing erroneous fine-tuning by adaptation on noisy batches. We rigorously evaluate the performance of Hi-Vec in challenging scenarios and on multiple target datasets, proving its strong capability to advance state-of-the-art methods. Our results show that Hi-Vec improves robustness, addresses uncertainty, and handles limited batch sizes and increased outlier rates.


【5】A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
标题:滚石不生苔:大型多模态模型中稳定自评估的自适应策略优化
链接:https://arxiv.org/abs/2508.09155

作者:ng, Hongcan Guo, Zheqi Lv, Shengyu Zhang
备注:17 pages, 9 figures
摘要:自我评估,一个模型的能力,以评估自己的输出的正确性,是至关重要的大型多模态模型(LVMH),以实现自我改进的多轮对话,但在基础模型中基本上缺乏。最近的工作采用了强化学习(RL)来增强自我评估;然而,当优化多个训练目标时,其固定奖励机制会受到奖励黑客的影响,导致模型崩溃。在本文中,我们提出了AdaPO,一个在线强化学习框架,能够根据每个任务的当前训练状态实时自适应地调整训练目标。具体来说,为了减轻奖励黑客攻击,AdaPO引入了自适应奖励模型(ARM)和奖励感知动态KL正则化机制。ARM根据模型生成的多圈轨迹的性能分布来评估任务的训练状态。奖励感知动态KL用动态系数代替固定惩罚,该动态系数由不同多回合情况之间的奖励间隙调制。值得注意的是,我们的方法自动和平滑地调整其学习重点的基础上子任务的训练进度,而无需人工干预。在8个基准测试和各种模型上的大量实验表明,我们的方法显着增强了直接推理和自我评估能力。我们将发布我们的代码,为社区做出贡献。
摘要:Self-evaluation, a model's ability to assess the correctness of its own output, is crucial for Large Multimodal Models (LMMs) to achieve self-improvement in multi-turn conversations, yet largely absent in foundation models. Recent work has employed reinforcement learning (RL) to enhance self-evaluation; however, its fixed reward mechanism suffers from reward hacking when optimizing multiple training objectives, leading to model collapse. In this paper we propose AdaPO, an online reinforcement learning framework capable of adaptively adjusting training objective in real time according to the current training state for each task. Specifically, to mitigate reward hacking , AdaPO introduces an Adaptive Reward Model (ARM) and a Reward Aware Dynamic KL Regularization mechanism. ARM assesses the task's training state from the distribution of model generated multi-turn trajectories' performance. Reward Aware Dynamic KL replaces a fixed penalty with dynamic coefficients which is modulated by the reward gap between different multi-turn situations. Notably, our method automatically and smoothly adjusts its learning focus based on sub-tasks' training progress without manual intervention. Extensive experiments over 8 benchmarks and various models show that our method significantly enhances both direct reasoning and self-evaluation capability. We will release our code to contribute to the community.


【6】Scalable h-adaptive probabilistic solver for time-independent and time-dependent systems
标题:时无关和时相关系统的可缩放h-自适应概率解算器
链接:https://arxiv.org/abs/2508.09623

作者:akur, Sawan Kumar, Matthew Zahr, Souvik Chakraborty
摘要:在概率数值框架内求解偏微分方程(PDE)提供了一种量化离散化所产生的认知不确定性的原则性方法。通过利用高斯过程回归并将控制PDE作为配置点的有限集合的约束,概率数值在任意位置提供无网格解决方案。然而,高的计算成本,其规模与配置点的数量成立方,仍然是一个关键的瓶颈,特别是对于大规模或高维问题。我们提出通过两项关键创新对该范式进行可扩展的增强。首先,我们开发了一种随机双重下降算法,该算法将每次迭代的复杂度从配置点数量的立方降低到线性,从而实现易于推理。其次,我们采用了基于聚类的主动学习策略,自适应地选择搭配点,以最大限度地提高信息增益,同时最大限度地减少计算费用。总之,这些贡献导致$h$自适应概率求解器,可以扩展到大量的配置点。我们证明了所提出的求解器在基准偏微分方程上的有效性,包括二维和三维稳态椭圆问题,以及在时空环境中制定的时间依赖的抛物偏微分方程。
摘要:Solving partial differential equations (PDEs) within the framework of probabilistic numerics offers a principled approach to quantifying epistemic uncertainty arising from discretization. By leveraging Gaussian process regression and imposing the governing PDE as a constraint at a finite set of collocation points, probabilistic numerics delivers mesh-free solutions at arbitrary locations. However, the high computational cost, which scales cubically with the number of collocation points, remains a critical bottleneck, particularly for large-scale or high-dimensional problems. We propose a scalable enhancement to this paradigm through two key innovations. First, we develop a stochastic dual descent algorithm that reduces the per-iteration complexity from cubic to linear in the number of collocation points, enabling tractable inference. Second, we exploit a clustering-based active learning strategy that adaptively selects collocation points to maximize information gain while minimizing computational expense. Together, these contributions result in an $h$-adaptive probabilistic solver that can scale to a large number of collocation points. We demonstrate the efficacy of the proposed solver on benchmark PDEs, including two- and three-dimensional steady-state elliptic problems, as well as a time-dependent parabolic PDE formulated in a space-time setting.


强化学习(3篇)

【1】Goal Discovery with Causal Capacity for Efficient Reinforcement Learning
标题:基于因果容量的目标发现算法及其强化学习
链接:https://arxiv.org/abs/2508.09624

作者:aodong Yang, Zhengbo Lu, Chengdong Ma, Wengang Zhou, Houqiang Li
摘要 :因果推理对于人类探索世界至关重要,可以建模以使智能体能够在强化学习中有效地探索环境。现有的研究表明,建立行动和状态转换之间的因果关系,将提高代理人的原因如何政策影响其未来的轨迹,从而促进定向探索。然而,它是具有挑战性的测量因果关系,由于其在复杂场景的广阔的国家行动空间的棘手性。在本文中,我们提出了一种新的目标发现与因果能力(GDCC)框架,有效的环境探索。具体来说,我们首先推导出状态空间中因果关系的度量,\n {即,}因果能力,它代表了代理人的行为对未来轨迹的最高影响。在此基础上,我们提出了一种基于蒙特卡罗的方法来确定离散状态空间中的临界点,并进一步优化这种方法,连续高维环境。这些关键点用于揭示智能体在环境中做出重要决策的位置,然后将其视为我们的子目标,以指导智能体更有目的地和有效地进行探索。多目标任务的实证结果表明,具有高因果能力的状态与我们预期的子目标一致,与基线相比,我们的GDCC实现了显着的成功率提高。
摘要:Causal inference is crucial for humans to explore the world, which can be modeled to enable an agent to efficiently explore the environment in reinforcement learning. Existing research indicates that establishing the causality between action and state transition will enhance an agent to reason how a policy affects its future trajectory, thereby promoting directed exploration. However, it is challenging to measure the causality due to its intractability in the vast state-action space of complex scenarios. In this paper, we propose a novel Goal Discovery with Causal Capacity (GDCC) framework for efficient environment exploration. Specifically, we first derive a measurement of causality in state space, \emph{i.e.,} causal capacity, which represents the highest influence of an agent's behavior on future trajectories. After that, we present a Monte Carlo based method to identify critical points in discrete state space and further optimize this method for continuous high-dimensional environments. Those critical points are used to uncover where the agent makes important decisions in the environment, which are then regarded as our subgoals to guide the agent to make exploration more purposefully and efficiently. Empirical results from multi-objective tasks demonstrate that states with high causal capacity align with our expected subgoals, and our GDCC achieves significant success rate improvements compared to baselines.


【2】Distilling Reinforcement Learning into Single-Batch Datasets
标题:将强化学习提炼为单批数据集
链接:https://arxiv.org/abs/2508.09283

作者:lhelm, Dan Ventura
备注:to be published in ECAI 2025 (appendix in arXiv version only), 11 pages (7 content, 4 appendix), 6 figures
摘要:数据集蒸馏将大数据集压缩成小的合成数据集,使得对合成数据集的学习近似于对原始数据集的学习。对提取数据集的训练可以在梯度下降的一个步骤中进行。我们通过将强化学习环境蒸馏成一批监督学习数据集,证明了蒸馏可推广到不同的任务。这不仅证明了Distillation压缩强化学习任务的能力,还证明了它将一种学习模式(强化学习)转换为另一种学习模式(监督学习)的能力。我们提出了一种新的扩展近端策略优化的元学习,并使用它在蒸馏的多维扩展的经典车杆问题,所有的MuJoCo环境,和几个雅达利游戏。我们展示了蒸馏将复杂的RL环境压缩到一步监督学习中的能力,探索了RL蒸馏在学习者架构中的可推广性,并展示了将环境蒸馏成最小可能的合成数据集的能力。
摘要:Dataset distillation compresses a large dataset into a small synthetic dataset such that learning on the synthetic dataset approximates learning on the original. Training on the distilled dataset can be performed in as little as one step of gradient descent. We demonstrate that distillation is generalizable to different tasks by distilling reinforcement learning environments into one-batch supervised learning datasets. This demonstrates not only distillation's ability to compress a reinforcement learning task but also its ability to transform one learning modality (reinforcement learning) into another (supervised learning). We present a novel extension of proximal policy optimization for meta-learning and use it in distillation of a multi-dimensional extension of the classic cart-pole problem, all MuJoCo environments, and several Atari games. We demonstrate distillation's ability to compress complex RL environments into one-step supervised learning, explore RL distillation's generalizability across learner architectures, and demonstrate distilling an environment into the smallest-possible synthetic dataset.


【3】Quantum-Efficient Reinforcement Learning Solutions for Last-Mile On-Demand Delivery
标题:用于最后一英里按需交付的量子高效强化学习解决方案
链接:https://arxiv.org/abs/2508.09183

作者:osavi, Bilal Farooq
备注:None
摘要:量子计算已被证明是解决NP难组合问题的一个有前途的选择。具体来说,当涉及到优化时,经典方法变得难以解释大规模解决方案。具体来说,我们研究量子计算来解决带有时间窗口的大规模容量限制提货和交付问题(CPDPTW)。在这方面,一个强化学习(RL)框架增强了参数化量子电路(PQC)的设计,以最大限度地减少旅行时间在一个现实的最后一英里按需交付。提出了一种新颖的具有纠缠变分层的问题特定编码量子电路。此外,最近策略优化(PPO)和量子奇异值变换(QSVT)的设计比较,通过数值实验,突出了所提出的方法的优势,在规模的解决方案和训练的复杂性,同时结合现实世界的约束。
摘要:Quantum computation has demonstrated a promising alternative to solving the NP-hard combinatorial problems. Specifically, when it comes to optimization, classical approaches become intractable to account for large-scale solutions. Specifically, we investigate quantum computing to solve the large-scale Capacitated Pickup and Delivery Problem with Time Windows (CPDPTW). In this regard, a Reinforcement Learning (RL) framework augmented with a Parametrized Quantum Circuit (PQC) is designed to minimize the travel time in a realistic last-mile on-demand delivery. A novel problem-specific encoding quantum circuit with an entangling and variational layer is proposed. Moreover, Proximal Policy Optimization (PPO) and Quantum Singular Value Transformation (QSVT) are designed for comparison through numerical experiments, highlighting the superiority of the proposed method in terms of the scale of the solution and training complexity while incorporating the real-world constraints.


元学习(2篇)

【1】Domain-Generalization to Improve Learning in Meta-Learning Algorithms
标题:领域概括以改善元学习算法中的学习
链接:https://arxiv.org/abs/2508.09418

作者:um, Chris Stockman, Cat Luong, Justin Zhan
摘要:本文介绍了领域泛化尖锐度感知最小化模型不可知元学习(DGS-MAML),这是一种新的元学习算法,旨在通过有限的训练数据在任务之间进行泛化。DGS-MAML在一个双层优化框架中将梯度匹配与锐度感知最小化相结合,以增强模型的适应性和鲁棒性。我们支持我们的方法与理论分析,使用PAC贝叶斯和收敛保证。在基准数据集上的实验结果表明,DGS-MAML在准确性和泛化能力方面优于现有方法。所提出的方法对于需要Few-Shot学习和快速适应的场景特别有用,并且源代码可在GitHub上公开获取。
摘要:This paper introduces Domain Generalization Sharpness-Aware Minimization Model-Agnostic Meta-Learning (DGS-MAML), a novel meta-learning algorithm designed to generalize across tasks with limited training data. DGS-MAML combines gradient matching with sharpness-aware minimization in a bi-level optimization framework to enhance model adaptability and robustness. We support our method with theoretical analysis using PAC-Bayes and convergence guarantees. Experimental results on benchmark datasets show that DGS-MAML outperforms existing approaches in terms of accuracy and generalization. The proposed method is particularly useful for scenarios requiring few-shot learning and quick adaptation, and the source code is publicly available at GitHub.


【2】Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
标题:分散环境下加速大型模型推理的元学习
链接:https://arxiv.org/abs/2508.09194

作者:, Zihao Wang, Ahmad Farhan, Claudio Angione, Harry Yang, Fielding Johnston, James P. Buban, Patrick Colangelo, Yue Zhao, Yuzhe Yang
备注:COLM2025
摘要:大规模模型的部署,例如大型语言模型(LLM),由于其计算需求而产生了大量成本。为了降低这些成本并解决与可扩展性和数据安全性相关的挑战,模型部署越来越多地转向分散式系统,其中选择有效的推理加速方案对于有效管理计算资源和增强系统响应能力至关重要。在这项工作中,我们通过引入基于元学习的框架来解决在分散系统中选择最佳加速方法的挑战。该框架通过从跨不同任务的各种加速技术的历史性能数据中学习来自动化选择过程。与依赖随机选择或专家直觉的传统方法不同,我们的方法根据每个任务的具体特征系统地确定最佳加速策略。我们证明,我们的元学习框架不仅简化了决策过程,而且在效率和性能方面始终优于传统方法。我们的研究结果突出了去中心化人工智能系统中推理加速的潜力,为实现更民主、更经济可行的人工智能解决方案提供了一条道路。
摘要:The deployment of large-scale models, such as large language models (LLMs), incurs substantial costs due to their computational demands. To mitigate these costs and address challenges related to scalability and data security, there is a growing shift towards decentralized systems for model deployment, where choosing efficient inference acceleration schemes become crucial to manage computational resources effectively and enhance system responsiveness. In this work, we address the challenge of selecting optimal acceleration methods in decentralized systems by introducing a meta-learning-based framework. This framework automates the selection process by learning from historical performance data of various acceleration techniques across different tasks. Unlike traditional methods that rely on random selection or expert intuition, our approach systematically identifies the best acceleration strategies based on the specific characteristics of each task. We demonstrate that our meta-learning framework not only streamlines the decision-making process but also consistently outperforms conventional methods in terms of efficiency and performance. Our results highlight the potential of inference acceleration in decentralized AI systems, offering a path towards more democratic and economically feasible artificial intelligence solutions.


医学相关(5篇)

【1】NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation
标题:神经:针对统一多模式资源限制临床评估的注意力引导修剪
链接:https://arxiv.org/abs/2508.09715

作者:oshi, Islem Rekik
摘要:多模态医学成像数据的快速增长提出了重大的存储和传输挑战,特别是在资源受限的临床环境中。我们提出了神经,一个新的框架,通过使用语义引导的数据压缩来解决这个问题。我们的方法从微调的生成视觉语言模型中重新调整图像及其放射学报告之间的交叉注意力分数,以在结构上修剪胸部X射线,仅保留诊断关键区域。此过程将图像转换为高度压缩的图形表示。这种统一的基于图的表示将修剪后的视觉图与从临床报告中导出的知识图融合在一起,创建了简化下游建模的通用数据结构。在肺炎检测的MIMIC-CXR和CheXpert Plus数据集上验证,NEURAL实现了93.4- 97.7%的图像数据大小减少,同时保持了0.88-0.95 AUC的高诊断性能,优于其他使用未压缩数据的基线模型。通过创建一个持久的、与任务无关的数据资产,NEURAL解决了数据大小和临床实用性之间的权衡,实现了高效的工作流程和远程放射学,而不会牺牲性能。我们的NEURAL代码可在https://github.com/basiralab/NEURAL上获得。
摘要:The rapid growth of multimodal medical imaging data presents significant storage and transmission challenges, particularly in resource-constrained clinical settings. We propose NEURAL, a novel framework that addresses this by using semantics-guided data compression. Our approach repurposes cross-attention scores between the image and its radiological report from a fine-tuned generative vision-language model to structurally prune chest X-rays, preserving only diagnostically critical regions. This process transforms the image into a highly compressed, graph representation. This unified graph-based representation fuses the pruned visual graph with a knowledge graph derived from the clinical report, creating a universal data structure that simplifies downstream modeling. Validated on the MIMIC-CXR and CheXpert Plus dataset for pneumonia detection, NEURAL achieves a 93.4-97.7\% reduction in image data size while maintaining a high diagnostic performance of 0.88-0.95 AUC, outperforming other baseline models that use uncompressed data. By creating a persistent, task-agnostic data asset, NEURAL resolves the trade-off between data size and clinical utility, enabling efficient workflows and teleradiology without sacrificing performance. Our NEURAL code is available at https://github.com/basiralab/NEURAL.


【2】FedMP: Tackling Medical Feature Heterogeneity in Federated Learning from a Manifold Perspective
标题:FedMP:从多元化角度解决联邦学习中的医疗特征异异性
链接:https://arxiv.org/abs/2508.09174

作者:ou, Shudong Liu, Zhaokun Zhou, Yang Liu, Qiang Yang, Yuesheng Zhu, Guibo Luo
摘要:联邦学习(FL)是一种分散的机器学习范式,其中多个客户端协作训练共享模型,而无需共享其本地私有数据。然而,FL的实际应用经常遇到来自参与客户端的非相同和独立分布(非IID)本地数据集的挑战,这在医学成像领域尤其明显,其中图像特征分布的变化显著阻碍了全局模型的收敛和性能。为了应对这一挑战,我们提出了FedMP,一种新的方法,旨在增强FL下非IID的情况下。FedMP采用随机特征流形完成来丰富单个客户端分类器的训练空间,并利用类原型来指导语义一致子空间内跨客户端的特征流形对齐,从而促进构建更清晰的决策边界。我们在多个医学成像数据集上验证了FedMP的有效性,包括具有真实世界多中心分布的数据集,以及多域自然图像数据集。实验结果表明,FedMP优于现有的FL算法。此外,我们还分析了在我们的方法中特征暴露的多维度、通信效率和隐私影响的影响。
摘要:Federated learning (FL) is a decentralized machine learning paradigm in which multiple clients collaboratively train a shared model without sharing their local private data. However, real-world applications of FL frequently encounter challenges arising from the non-identically and independently distributed (non-IID) local datasets across participating clients, which is particularly pronounced in the field of medical imaging, where shifts in image feature distributions significantly hinder the global model's convergence and performance. To address this challenge, we propose FedMP, a novel method designed to enhance FL under non-IID scenarios. FedMP employs stochastic feature manifold completion to enrich the training space of individual client classifiers, and leverages class-prototypes to guide the alignment of feature manifolds across clients within semantically consistent subspaces, facilitating the construction of more distinct decision boundaries. We validate the effectiveness of FedMP on multiple medical imaging datasets, including those with real-world multi-center distributions, as well as on a multi-domain natural image dataset. The experimental results demonstrate that FedMP outperforms existing FL algorithms. Additionally, we analyze the impact of manifold dimensionality, communication efficiency, and privacy implications of feature exposure in our method.


【3】Masked Training for Robust Arrhythmia Detection from Digitalized Multiple Layout ECG Images
标题:基于掩蔽训练的数字化多布局心电图心律失常检测
链接:https://arxiv.org/abs/2508.09165

作者:hang, Deyun Zhang, Yirao Tao, Kexin Wang, Shijia Geng, Jun Li, Qinghao Zhao, Xingpeng Liu, Yuxi Zhou, Shenda Hong
备注:18 pages, 6 figures
摘要:心电图是诊断心律失常等心血管疾病的重要工具。由于不同医院使用的心电图布局的差异,数字化信号呈现出异步的提前时间和部分停电损失,这对现有的模型提出了严峻的挑战。为了应对这一挑战,该研究引入了PatchECG,这是一种基于掩蔽训练策略的自适应可变块计数缺失表示学习框架,它自动聚焦于导联之间具有协作依赖关系的关键补丁,从而实现对不同布局的心电图中心律失常的关键识别。实验在PTB-XL数据集和使用ECG图像工具包工具生成的21388幅异步ECG图像上进行,使用23个子类作为标签。所提出的方法在不同的布局下表现出很强的鲁棒性,平均受试者工作特征曲线下面积(AUROC)为0.835,并且保持稳定(随着布局的变化而不变)。在基于朝阳医院400幅真实ECG图像数据的外部验证中,房颤诊断的AUROC达到0.778;在12 x 1布局ECG上,AUROC达到0.893。这一结果优于各种经典的插值和基线方法,与目前最优的大规模预训练模型ECGFounder相比,提高了0.111和0.19。
摘要 :Electrocardiogram (ECG) as an important tool for diagnosing cardiovascular diseases such as arrhythmia. Due to the differences in ECG layouts used by different hospitals, the digitized signals exhibit asynchronous lead time and partial blackout loss, which poses a serious challenge to existing models. To address this challenge, the study introduced PatchECG, a framework for adaptive variable block count missing representation learning based on a masking training strategy, which automatically focuses on key patches with collaborative dependencies between leads, thereby achieving key recognition of arrhythmia in ECGs with different layouts. Experiments were conducted on the PTB-XL dataset and 21388 asynchronous ECG images generated using ECG image kit tool, using the 23 Subclasses as labels. The proposed method demonstrated strong robustness under different layouts, with average Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.835 and remained stable (unchanged with layout changes). In external validation based on 400 real ECG images data from Chaoyang Hospital, the AUROC for atrial fibrillation diagnosis reached 0.778; On 12 x 1 layout ECGs, AUROC reaches 0.893. This result is superior to various classic interpolation and baseline methods, and compared to the current optimal large-scale pre-training model ECGFounder, it has improved by 0.111 and 0.19.


【4】Presenting DiaData for Research on Type 1 Diabetes
标题:推出DiaData用于1型糖尿病研究
链接:https://arxiv.org/abs/2508.09160

作者:ar, Maria Maleshkova
备注:11 pages, 7 figures, 3 tables
摘要:1型糖尿病(T1 D)是一种自身免疫性疾病,导致胰岛素产生细胞的破坏,导致胰岛素缺乏,至于为什么受影响的个体依赖于外部胰岛素注射。然而,胰岛素可降低血糖水平,并可引起低血糖。低血糖症是一种严重的低血糖水平(70 mg/dL)事件,伴有头晕、昏迷或死亡的危险副作用。数据分析可以通过识别导致不良事件的个人模式和趋势来显着增强糖尿病护理。特别是,机器学习(ML)模型可以预测葡萄糖水平并提供早期警报。然而,糖尿病和低血糖症的研究受到大型数据集不可用的限制。因此,这项工作系统地整合了15个数据集,以提供每5分钟记录一次血糖测量值的2510名受试者的大型数据库。总共包括1.49亿个测量值,其中4%代表低血糖范围内的值。此外,提取了两个子数据库。子数据库I包括人口统计数据,子数据库II包括心率数据。综合数据集提供了性别和不同年龄层次的平等分布。作为进一步的贡献,数据质量进行了评估,显示数据不平衡和缺失值是一个重大挑战。此外,对血糖水平和心率数据进行了相关性研究,显示了低血糖前15分钟和55分钟之间的关系。
摘要:Type 1 diabetes (T1D) is an autoimmune disorder that leads to the destruction of insulin-producing cells, resulting in insulin deficiency, as to why the affected individuals depend on external insulin injections. However, insulin can decrease blood glucose levels and can cause hypoglycemia. Hypoglycemia is a severe event of low blood glucose levels ($\le$70 mg/dL) with dangerous side effects of dizziness, coma, or death. Data analysis can significantly enhance diabetes care by identifying personal patterns and trends leading to adverse events. Especially, machine learning (ML) models can predict glucose levels and provide early alarms. However, diabetes and hypoglycemia research is limited by the unavailability of large datasets. Thus, this work systematically integrates 15 datasets to provide a large database of 2510 subjects with glucose measurements recorded every 5 minutes. In total, 149 million measurements are included, of which 4% represent values in the hypoglycemic range. Moreover, two sub-databases are extracted. Sub-database I includes demographics, and sub-database II includes heart rate data. The integrated dataset provides an equal distribution of sex and different age levels. As a further contribution, data quality is assessed, revealing that data imbalance and missing values present a significant challenge. Moreover, a correlation study on glucose levels and heart rate data is conducted, showing a relation between 15 and 55 minutes before hypoglycemia.


【5】A Generative Imputation Method for Multimodal Alzheimer's Disease Diagnosis
标题:多模式阿尔茨海默病诊断的生成插补方法
链接:https://arxiv.org/abs/2508.09271

作者:Hassanzadeh, Anees Abrol, Hamid Reza Hassanzadeh, Vince D. Calhoun
摘要:多模态数据分析可以更准确地诊断大脑疾病,因为每种模态都增加了互补的信息。然而,在神经成像领域使用多模态数据集的主要挑战是不完整的数据,其中某些模态对于某些受试者是缺失的。因此,需要有效的策略来完成数据。传统的方法,如二次采样或零填充,可能会降低预测的准确性或引入意外的偏差。相比之下,生成模型等先进方法已经成为没有这些限制的有前途的解决方案。在这项研究中,我们提出了一种生成对抗网络方法,旨在从现有模式中重建缺失的模式,同时保留疾病模式。我们使用T1加权结构磁共振成像和功能网络连接作为两种模式。我们的研究结果显示,与传统方法相比,使用我们的生成插补方法时,阿尔茨海默病与认知正常组的分类准确性提高了9%。
摘要:Multimodal data analysis can lead to more accurate diagnoses of brain disorders due to the complementary information that each modality adds. However, a major challenge of using multimodal datasets in the neuroimaging field is incomplete data, where some of the modalities are missing for certain subjects. Hence, effective strategies are needed for completing the data. Traditional methods, such as subsampling or zero-filling, may reduce the accuracy of predictions or introduce unintended biases. In contrast, advanced methods such as generative models have emerged as promising solutions without these limitations. In this study, we proposed a generative adversarial network method designed to reconstruct missing modalities from existing ones while preserving the disease patterns. We used T1-weighted structural magnetic resonance imaging and functional network connectivity as two modalities. Our findings showed a 9% improvement in the classification accuracy for Alzheimer's disease versus cognitive normal groups when using our generative imputation method compared to the traditional approaches.


蒸馏|知识提取(3篇)

【1】Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
标题:超越比例定律:用于推理的数据高效蒸馏框架
链接:https://arxiv.org/abs/2508.09883

作者:u, Xiaoguang Jiang, Huiyang Li, Jucai Zhai, Dengfeng Liu, Qiaobo Hao, Huang Liu, Zhiguo Yang, Ji Xie, Ninglun Gu, Jin Yang, Kailai Zhang, Yelun Bao, Jun Wang
摘要:大型语言模型(LLM)在算法编码和数学问题解决等任务中表现出卓越的推理能力。最近的方法通过扩展语料库和结合强化学习和监督微调的多级训练来改进推理。尽管一些方法表明,小而有针对性的数据集可以通过蒸馏来激励推理,但推理比例律仍在形成,增加了计算成本。为了解决这个问题,我们提出了一个数据高效蒸馏框架(DED),优化帕累托前沿的推理蒸馏。受政策学习和强化学习的多样化推广策略的启发,我们的方法的关键思想有三个方面:(1)我们确定基准分数本身并不能决定有效的教师模型。通过对主导推理LLM的综合比较,我们提出了一种选择最优教师模型的方法。(2)虽然缩放蒸馏可以增强推理,但它通常会降低域外性能。一个精心策划的,较小的语料库实现了域内和域外能力之间的平衡。(3)不同的推理轨迹鼓励学生模型发展强大的推理技能。我们通过对数学推理(AIME 2024/2025,MATH-500)和代码生成(LiveCodeBench)的评估来验证我们的方法,仅用0.8k精心策划的示例就实现了最先进的结果,绕过了广泛扩展的需要。我们的系统分析表明,DED优于现有的方法,考虑因素超越表面硬度,令牌长度,或教师模型的能力。这项工作提供了一个实用和有效的途径,先进的推理,同时保留一般能力。
摘要 :Large language models (LLMs) demonstrate remarkable reasoning capabilities in tasks such as algorithmic coding and mathematical problem-solving. Recent methods have improved reasoning through expanded corpus and multistage training combining reinforcement learning and supervised fine-tuning. Although some methods suggest that small but targeted dataset can incentivize reasoning via only distillation, a reasoning scaling laws is still taking shape, increasing computational costs. To address this, we propose a data-efficient distillation framework (DED) that optimizes the Pareto frontier of reasoning distillation. Inspired by the on-policy learning and diverse roll-out strategies of reinforcement learning, the key idea of our approach is threefold: (1) We identify that benchmark scores alone do not determine an effective teacher model. Through comprehensive comparisons of leading reasoning LLMs, we develop a method to select an optimal teacher model. (2) While scaling distillation can enhance reasoning, it often degrades out-of-domain performance. A carefully curated, smaller corpus achieves a balanced trade-off between in-domain and out-of-domain capabilities. (3) Diverse reasoning trajectories encourage the student model to develop robust reasoning skills. We validate our method through evaluations on mathematical reasoning (AIME 2024/2025, MATH-500) and code generation (LiveCodeBench), achieving state-of-the-art results with only 0.8k carefully curated examples, bypassing the need for extensive scaling. Our systematic analysis demonstrates that DED outperforms existing methods by considering factors beyond superficial hardness, token length, or teacher model capability. This work offers a practical and efficient pathway to advanced reasoning while preserving general capabilities.


【2】HyperKD: Distilling Cross-Spectral Knowledge in Masked Autoencoders via Inverse Domain Shift with Spatial-Aware Masking and Specialized Loss
标题:HyperKD:通过具有空间感知掩蔽和专门损失的逆域移位在掩蔽自动编码器中提取互谱知识
链接:https://arxiv.org/abs/2508.09453

作者:in, Tanjim Bin Faruk, Shrideep Pallickara, Sangmi Lee Pallickara
摘要:在大规模未标记数据集上进行预训练的基础模型的激增已经成为创建可适应和可重用架构的有效方法,这些架构可以用于使用卫星观测的各种下游任务。然而,由于固有的光谱差异和可用观测的稀缺性,它们直接应用于高光谱遥感仍然具有挑战性。在这项工作中,我们提出了HyperKD,一种新的知识蒸馏框架,使学习表示从教师模型到学生模型的有效发展的基础模型的高光谱图像。与典型的知识蒸馏框架不同,该框架使用复杂的教师来指导简单的学生,HyperKD在更简单的教师模型的指导下,实现了不同类型光谱数据之间知识转移的逆形式。HyperKD以Masked Autoencoder为基础,将Prithvi基础模型中的知识提取出来,为EnMAP高光谱图像量身定制。HyperKD通过引入基于特征的策略解决了具有光谱间隙的逆域自适应问题,该策略包括基于光谱范围的通道对齐,空间特征引导的掩蔽以及为高光谱图像量身定制的增强损失函数。HyperKD弥合了巨大的谱域差距,使预训练的基础模型能够有效地用于地理空间应用。大量的实验表明,HyperKD显着提高了MAE中的表示学习,从而增强了重建保真度,并在土地覆盖分类,作物类型识别和土壤有机碳预测等下游任务上表现得更加稳健,从而巩固了高光谱图像遥感分析中知识蒸馏框架的潜力。
摘要:The proliferation of foundation models, pretrained on large-scale unlabeled datasets, has emerged as an effective approach in creating adaptable and reusable architectures that can be leveraged for various downstream tasks using satellite observations. However, their direct application to hyperspectral remote sensing remains challenging due to inherent spectral disparities and the scarcity of available observations. In this work, we present HyperKD, a novel knowledge distillation framework that enables transferring learned representations from a teacher model into a student model for effective development of a foundation model on hyperspectral images. Unlike typical knowledge distillation frameworks, which use a complex teacher to guide a simpler student, HyperKD enables an inverse form of knowledge transfer across different types of spectral data, guided by a simpler teacher model. Building upon a Masked Autoencoder, HyperKD distills knowledge from the Prithvi foundational model into a student tailored for EnMAP hyperspectral imagery. HyperKD addresses the inverse domain adaptation problem with spectral gaps by introducing a feature-based strategy that includes spectral range-based channel alignment, spatial feature-guided masking, and an enhanced loss function tailored for hyperspectral images. HyperKD bridges the substantial spectral domain gap, enabling the effective use of pretrained foundation models for geospatial applications. Extensive experiments show that HyperKD significantly improves representation learning in MAEs, leading to enhanced reconstruction fidelity and more robust performance on downstream tasks such as land cover classification, crop type identification, and soil organic carbon prediction, underpinning the potential of knowledge distillation frameworks in remote sensing analytics with hyperspectral imagery.


【3】Pattern-based Knowledge Component Extraction from Student Code Using Representation Learning
标题:使用表示学习从学生代码中提取基于模式的知识成分
链接:https://arxiv.org/abs/2508.09281

作者:Hoq, Griffin Pitts, Andrew Lan, Peter Brusilovsky, Bita Akram
摘要:计算机科学教育中有效的个性化学习取决于准确地建模学生知道什么和他们需要学习什么。虽然知识组件(KCs)为这种建模提供了基础,但由于发现的KCs的可解释性不足以及编程问题的开放性,从学生代码中自动提取KCs本质上是具有挑战性的,这些问题在学生解决方案中具有显着的结构变化性,并且编程概念之间存在复杂的相互作用。在这项工作中,我们提出了一种新的,可解释的框架,通过基于模式的KC自动KC发现:学生代码中的重复结构模式,捕捉特定的编程模式和语言结构,学生必须掌握。为此,我们训练变分自动编码器,以在可解释的、基于注意力的代码表示模型的指导下,从学生代码中生成重要的代表性模式,该模型可以从学生代码中识别重要的正确和不正确的模式实现。然后,这些模式被聚类以形成基于模式的KC。我们使用认知科学提供的两种成熟的方法来评估我们的KC:学习曲线分析和深度知识跟踪(DKT)。实验结果表明,有意义的学习轨迹和显着改善DKT的预测性能比传统的KT方法。这项工作通过提供一个自动化,可扩展和可解释的框架来识别粒度代码模式和算法结构,从而推进CS教育中的知识建模,这对学生学习至关重要。
摘要:Effective personalized learning in computer science education depends on accurately modeling what students know and what they need to learn. While Knowledge Components (KCs) provide a foundation for such modeling, automated KC extraction from student code is inherently challenging due to insufficient explainability of discovered KCs and the open-endedness of programming problems with significant structural variability across student solutions and complex interactions among programming concepts. In this work, we propose a novel, explainable framework for automated KC discovery through pattern-based KCs: recurring structural patterns within student code that capture the specific programming patterns and language constructs that students must master. Toward this, we train a Variational Autoencoder to generate important representative patterns from student code guided by an explainable, attention-based code representation model that identifies important correct and incorrect pattern implementations from student code. These patterns are then clustered to form pattern-based KCs. We evaluate our KCs using two well-established methods informed by Cognitive Science: learning curve analysis and Deep Knowledge Tracing (DKT). Experimental results demonstrate meaningful learning trajectories and significant improvements in DKT predictive performance over traditional KT methods. This work advances knowledge modeling in CS education by providing an automated, scalable, and explainable framework for identifying granular code patterns and algorithmic constructs, essential for student learning.


自动驾驶|车辆|车道检测等(1篇)

【1】NEXICA: Discovering Road Traffic Causality (Extended arXiv Version)
标题:NEXICA:发现道路交通因果关系(扩展arXiv版本)
链接:https://arxiv.org/abs/2508.09447

作者: Srikanth, John Krumm, Jonathan Qin
备注:Extended version of short paper in 32nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2024)
摘要:道路交通拥堵是一个长期存在的问题。将资源集中在拥塞的原因上是减少减速的潜在有效策略。我们提出了NEXICA,一种算法来发现公路系统的哪些部分往往会导致其他部分的高速公路减速。我们使用的时间序列的道路速度作为我们的因果发现算法的输入。发现其他算法的不足,我们开发了一种新的方法,在三个方面是新颖的。首先,它只关注时间序列中事件的存在或不存在,其中事件指示交通减速的时间开始。其次,我们开发了一个概率模型,使用最大似然估计计算的概率,自发的和造成的减速在高速公路上的两个位置之间。第三,我们训练一个二元分类器来识别在成对的道路位置上训练的成对的因果位置,在这些位置上我们可以合理地先验确定它们的因果关系,无论是正面的还是负面的。我们在洛杉矶地区195个不同高速公路速度传感器的六个月道路速度数据上测试了我们的方法,结果表明我们的方法在准确性和计算速度方面都优于最先进的基线。
摘要 :Road traffic congestion is a persistent problem. Focusing resources on the causes of congestion is a potentially efficient strategy for reducing slowdowns. We present NEXICA, an algorithm to discover which parts of the highway system tend to cause slowdowns on other parts of the highway. We use time series of road speeds as inputs to our causal discovery algorithm. Finding other algorithms inadequate, we develop a new approach that is novel in three ways. First, it concentrates on just the presence or absence of events in the time series, where an event indicates the temporal beginning of a traffic slowdown. Second, we develop a probabilistic model using maximum likelihood estimation to compute the probabilities of spontaneous and caused slowdowns between two locations on the highway. Third, we train a binary classifier to identify pairs of cause/effect locations trained on pairs of road locations where we are reasonably certain a priori of their causal connections, both positive and negative. We test our approach on six months of road speed data from 195 different highway speed sensors in the Los Angeles area, showing that our approach is superior to state-of-the-art baselines in both accuracy and computation speed.


点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
标题:RayletDF:从点云或高斯进行可推广3D表面重建的Raylet距离场
链接:https://arxiv.org/abs/2508.09830

作者:Wei, Jinxi Li, Yafei Yang, Siyuan Zhou, Bo Yang
备注:ICCV 2025 Highlight. Shenxing and Jinxi are co-first authors. Code and data are available at: this https URL
摘要:本文提出了一种基于RGB图像的3DGS方法,从原始点云或预估计的3D高斯点云重建三维表面的通用方法。与现有的基于坐标的方法,这往往是计算密集型渲染显式表面时,我们提出的方法,名为RayletDF,引入了一种新的技术,称为raylet距离场,其目的是直接预测表面点从查询射线。我们的流水线由三个关键模块组成:一个光线特征提取器,一个光线距离场预测器和一个多光线混合器。这些组件一起工作,以提取细粒度的局部几何特征,预测射线距离,并聚合多个预测,以重建精确的表面点。我们在多个公共真实世界数据集上广泛评估了我们的方法,证明了从点云或3D高斯曲面重建的优越性能。最值得注意的是,我们的方法实现了卓越的泛化能力,在测试中成功地在看不见的数据集上通过单次向前传递恢复3D表面。
摘要:In this paper, we present a generalizable method for 3D surface reconstruction from raw point clouds or pre-estimated 3D Gaussians by 3DGS from RGB images. Unlike existing coordinate-based methods which are often computationally intensive when rendering explicit surfaces, our proposed method, named RayletDF, introduces a new technique called raylet distance field, which aims to directly predict surface points from query rays. Our pipeline consists of three key modules: a raylet feature extractor, a raylet distance field predictor, and a multi-raylet blender. These components work together to extract fine-grained local geometric features, predict raylet distances, and aggregate multiple predictions to reconstruct precise surface points. We extensively evaluate our method on multiple public real-world datasets, demonstrating superior performance in surface reconstruction from point clouds or 3D Gaussians. Most notably, our method achieves exceptional generalization ability, successfully recovering 3D surfaces in a single-forward pass across unseen datasets in testing.


联邦学习|隐私保护|加密(2篇)

【1】Large-Small Model Collaborative Framework for Federated Continual Learning
标题:联邦持续学习的大大小小模型协作框架
链接:https://arxiv.org/abs/2508.09489

作者:in Yang, Boyang Fan, Xuemei Cao, Hanlin Gu, Lixin Fan, Qiang Yang
摘要:基础模型(FM)的持续学习(CL)是一个重要但尚未探索的挑战,特别是在联邦持续学习(FCL)中,每个客户端都在严格的数据和通信约束下从私有的不断发展的任务流中学习。尽管它们具有强大的泛化能力,但FM在本地下游任务上往往表现出次优性能,因为它们无法利用私有本地数据。此外,使FM能够在不忘记先前知识的情况下学习新任务本质上是一个具有挑战性的问题,主要是由于其巨大的参数计数和高模型复杂性。相比之下,小型模型可以在资源受限的条件下进行本地训练,并受益于更成熟的CL技术。为了弥合小型模型和FM之间的差距,我们提出了FCL中的第一个协作框架,其中轻量级本地模型充当动态桥梁,不断适应新任务,同时增强大型模型的实用性。还包括两个新的组件:小模型连续微调是为了防止小模型的时间遗忘,一个一个蒸馏执行个性化融合的异构本地知识的服务器上。实验结果表明,即使客户端使用异构的小模型,其优越的性能。
摘要:Continual learning (CL) for Foundation Models (FMs) is an essential yet underexplored challenge, especially in Federated Continual Learning (FCL), where each client learns from a private, evolving task stream under strict data and communication constraints. Despite their powerful generalization abilities, FMs often exhibit suboptimal performance on local downstream tasks, as they are unable to utilize private local data. Furthermore, enabling FMs to learn new tasks without forgetting prior knowledge is inherently a challenging problem, primarily due to their immense parameter count and high model complexity. In contrast, small models can be trained locally under resource-constrained conditions and benefit from more mature CL techniques. To bridge the gap between small models and FMs, we propose the first collaborative framework in FCL, where lightweight local models act as a dynamic bridge, continually adapting to new tasks while enhancing the utility of the large model. Two novel components are also included: Small Model Continual Fine-tuning is for preventing small models from temporal forgetting; One-by-One Distillation performs personalized fusion of heterogeneous local knowledge on the server. Experimental results demonstrate its superior performance, even when clients utilize heterogeneous small models.


【2】Long-Term Client Selection for Federated Learning with Non-IID Data: A Truthful Auction Approach
标题:使用非IID数据进行联邦学习的长期客户选择:真实拍卖方法
链接:https://arxiv.org/abs/2508.09181

作者:Tan, Zhian Liu, Kun Guo, Mingxiong Zhao
摘要:联邦学习(FL)提供了一个分散的框架,通过在移动节点上的协作努力实现通用模型训练,例如车联网(IoV)中的智能车辆。每个智能车辆都充当移动客户端,在不上传本地数据的情况下为该过程做出贡献。该方法利用了来自不同车辆的非独立同分布(非IID)训练数据,这些数据受到各种驾驶模式和环境条件的影响,这会显著影响模型的收敛性和准确性。虽然客户端选择可以是一个可行的解决方案,非IID问题,它面临着与选择指标的挑战。传统的指标在每轮独立评估客户端数据质量,并要求在所有客户端完成本地培训后选择客户端,导致未使用的培训结果造成资源浪费。在车联网环境下,车辆的连接和计算资源有限,客户选择中的信息不对称可能会导致客户提交错误信息,从而可能导致选择无效。为了应对这些挑战,我们提出了一种新的长期客户选择联邦学习的基础上真实拍卖(LCSFLA)。该方案使用新的评估机制和能源成本,考虑长期数据质量,最大限度地提高社会福利,并建议拍卖机制与存款要求,激励客户参与,并确保信息的真实性。从理论上证明了建议性激励机制的激励相容性和个体合理性。在各种数据集上的实验结果,包括来自车联网场景的数据集,证明了它在减轻非IID数据引起的性能下降方面的有效性。
摘要:Federated learning (FL) provides a decentralized framework that enables universal model training through collaborative efforts on mobile nodes, such as smart vehicles in the Internet of Vehicles (IoV). Each smart vehicle acts as a mobile client, contributing to the process without uploading local data. This method leverages non-independent and identically distributed (non-IID) training data from different vehicles, influenced by various driving patterns and environmental conditions, which can significantly impact model convergence and accuracy. Although client selection can be a feasible solution for non-IID issues, it faces challenges related to selection metrics. Traditional metrics evaluate client data quality independently per round and require client selection after all clients complete local training, leading to resource wastage from unused training results. In the IoV context, where vehicles have limited connectivity and computational resources, information asymmetry in client selection risks clients submitting false information, potentially making the selection ineffective. To tackle these challenges, we propose a novel Long-term Client-Selection Federated Learning based on Truthful Auction (LCSFLA). This scheme maximizes social welfare with consideration of long-term data quality using a new assessment mechanism and energy costs, and the advised auction mechanism with a deposit requirement incentivizes client participation and ensures information truthfulness. We theoretically prove the incentive compatibility and individual rationality of the advised incentive mechanism. Experimental results on various datasets, including those from IoV scenarios, demonstrate its effectiveness in mitigating performance degradation caused by non-IID data.


推理|分析|理解|解释(10篇)

【1】Feature Impact Analysis on Top Long-Jump Performances with Quantile Random Forest and Explainable AI Techniques
标题:利用分位数随机森林和可解释人工智能技术对顶级跳远成绩的特征影响分析
链接:https://arxiv.org/abs/2508.09810

作者:tephan Clémençon, Mounîm A.El-Yacoubi, Sao Mai Nguyen, Eric Fenaux, Ons Jelassi
备注:15 pages, 6 figures
摘要:生物力学特征已成为评价运动员技术的重要指标。传统上,专家提出重要的特征,并使用物理方程对其进行评估。然而,人体及其运动的复杂性使得明确分析某些特征与运动员最终表现之间的关系具有挑战性。随着现代机器学习和统计学的进步,数据分析方法在体育分析中越来越重要。在这项研究中,我们利用机器学习模型来分析专家提出的世界锦标赛跳远决赛的生物力学特征。分析的目标包括确定最重要的功能,有助于最高性能的跳跃和探索这些关键功能的综合效果。使用分位数回归,我们建立了生物力学特征集和目标变量(有效距离)之间的关系模型,特别关注精英级跳跃。为了解释模型,我们应用SHapley加法解释(SHAP)以及部分依赖图(PDP)和个体条件期望(ICE)图。研究结果表明,除了有据可查的速度相关特征外,具体的技术方面也起着关键作用。对于男性运动员,在我们的数据集中,起跳前支撑腿的膝盖角度被确定为实现前10%性能的关键因素,大于169{\deg}的角度对跳跃性能有显着贡献。相比之下,对于女运动员来说,落地姿势和助跑步技术是影响前10%成绩的最关键特征,其次是速度。本研究建立了一个框架,分析各种特征对运动成绩的影响,特别强调了顶级赛事。
摘要:Biomechanical features have become important indicators for evaluating athletes' techniques. Traditionally, experts propose significant features and evaluate them using physics equations. However, the complexity of the human body and its movements makes it challenging to explicitly analyze the relationships between some features and athletes' final performance. With advancements in modern machine learning and statistics, data analytics methods have gained increasing importance in sports analytics. In this study, we leverage machine learning models to analyze expert-proposed biomechanical features from the finals of long jump competitions in the World Championships. The objectives of the analysis include identifying the most important features contributing to top-performing jumps and exploring the combined effects of these key features. Using quantile regression, we model the relationship between the biomechanical feature set and the target variable (effective distance), with a particular focus on elite-level jumps. To interpret the model, we apply SHapley Additive exPlanations (SHAP) alongside Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) plots. The findings reveal that, beyond the well-documented velocity-related features, specific technical aspects also play a pivotal role. For male athletes, the angle of the knee of the supporting leg before take-off is identified as a key factor for achieving top 10% performance in our dataset, with angles greater than 169{\deg}contributing significantly to jump performance. In contrast, for female athletes, the landing pose and approach step technique emerge as the most critical features influencing top 10% performances, alongside velocity. This study establishes a framework for analyzing the impact of various features on athletic performance, with a particular emphasis on top-performing events.


【2】Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
标题:多采样少思考:组过滤策略优化以实现简洁推理
链接:https://arxiv.org/abs/2508.09726

作者: Shrivastava, Ahmed Awadallah, Vidhisha Balachandran, Shivam Garg, Harkirat Behl, Dimitris Papailiopoulos
摘要:用强化学习训练的大型语言模型,带有可验证的奖励,倾向于用准确性换取长度--增加响应长度以提高准确性。虽然对于更难的问题,可能需要更长的答案,但许多标记只是“填充”:重复的、冗长的文本,没有真正的进展。我们引入了GFPO(Group Filtered Policy Optimization,组过滤策略优化),它通过在训练过程中对每个问题进行更大的组采样来抑制这种长度爆炸,并基于两个关键指标对训练的响应进行过滤:(1)响应长度和(2)令牌效率:每个令牌的奖励比率。通过在训练时更多地采样,我们教模型在推理时少思考。在Phi-4推理模型上,GFPO在具有挑战性的STEM和编码基准(AIME 24/25,GPQA,Omni-MATH,LiveCodeBench)中将GRPO的长度膨胀减少了46-71%,同时保持准确性。优化每个代币的奖励进一步将长度膨胀减少到71- 85%。我们还提出了自适应难度GFPO,它根据实时难度估计动态地将更多的训练资源分配给更难的问题,提高了计算效率和准确性之间的平衡,特别是在困难的问题上。GFPO表明,增加的训练时间计算直接转化为减少的测试时间计算-这是一种简单而有效的权衡,可以有效地进行推理。
摘要:Large language models trained with reinforcement learning with verifiable rewards tend to trade accuracy for length--inflating response lengths to achieve gains in accuracy. While longer answers may be warranted for harder problems, many tokens are merely "filler": repetitive, verbose text that makes no real progress. We introduce GFPO (Group Filtered Policy Optimization), which curbs this length explosion by sampling larger groups per problem during training and filtering responses to train on based on two key metrics: (1) response length and (2) token efficiency: reward per token ratio. By sampling more at training time, we teach models to think less at inference time. On the Phi-4-reasoning model, GFPO cuts GRPO's length inflation by 46-71% across challenging STEM and coding benchmarks (AIME 24/25, GPQA, Omni-MATH, LiveCodeBench) while maintaining accuracy. Optimizing for reward per token further increases reductions in length inflation to 71-85%. We also propose Adaptive Difficulty GFPO, which dynamically allocates more training resources to harder problems based on real-time difficulty estimates, improving the balance between computational efficiency and accuracy especially on difficult questions. GFPO demonstrates that increased training-time compute directly translates to reduced test-time compute--a simple yet effective trade-off for efficient reasoning.


【3】Global Convergence Analysis of Vanilla Gradient Descent for Asymmetric Matrix Completion
标题:非对称矩阵完成的香草梯度下降的全局收敛性分析
链接:https://arxiv.org/abs/2508.09685

作者: Shuo Chen, Jinsheng Li, Xiangying Pang, Maoguo Gong
摘要:本文研究了非对称低秩矩阵完备化问题,该问题可以表示为一个具有非线性最小二乘目标函数的无约束非凸优化问题,并通过梯度下降法求解。以前的梯度下降方法通常将正则化项合并到目标函数中以保证收敛。然而,数值实验和梯度流的理论分析都表明,梯度下降算法中正则项的消除不会对收敛性能产生不利影响。通过引入留一法,我们归纳地证明了谱初始化的香草梯度下降法以高概率达到线性收敛速度。此外,我们证明了平衡正则化项在迭代过程中表现出小的范数,这揭示了梯度下降的隐式正则化性质。实验结果表明,我们的算法具有较低的计算成本,同时保持可比的完成性能相比,其他梯度下降算法。
摘要:This paper investigates the asymmetric low-rank matrix completion problem, which can be formulated as an unconstrained non-convex optimization problem with a nonlinear least-squares objective function, and is solved via gradient descent methods. Previous gradient descent approaches typically incorporate regularization terms into the objective function to guarantee convergence. However, numerical experiments and theoretical analysis of the gradient flow both demonstrate that the elimination of regularization terms in gradient descent algorithms does not adversely affect convergence performance. By introducing the leave-one-out technique, we inductively prove that the vanilla gradient descent with spectral initialization achieves a linear convergence rate with high probability. Besides, we demonstrate that the balancing regularization term exhibits a small norm during iterations, which reveals the implicit regularization property of gradient descent. Empirical results show that our algorithm has a lower computational cost while maintaining comparable completion performance compared to other gradient descent algorithms.


【4】Thermal Tracks: A Gaussian process-based framework for universal melting curve analysis enabling unconstrained hit identification in thermal proteome profiling experiments
标题:热轨:基于高斯过程的通用融化曲线分析框架,能够在热蛋白质组分析实验中实现无限制的命中识别
链接:https://arxiv.org/abs/2508.09659

作者 :F. Hevler, Shivam Verma, Mirat Soijtra, Carolyn R. Bertozzi
备注:5 pages, 2 figures, short communication
摘要:Thermal Tracks是一个基于Python的统计框架,用于分析蛋白质热稳定性数据,克服了现有热蛋白质组分析(TPP)工作流程的关键限制。与采用S形熔解曲线并受经验零分布约束的标准方法不同(将显著命中限制在约5%的数据),Thermal Tracks使用具有平方指数内核的高斯过程(GP)模型灵活地对任何熔解曲线形状建模,同时通过内核先验生成无偏零分布。该框架对于分析显著改变蛋白质热稳定性的蛋白质组范围的扰动特别有价值,例如通路抑制、遗传修饰或环境应力,其中常规TPP方法可能由于其统计限制而错过生物学相关的变化。此外,Thermal Tracks擅长分析具有非常规熔解曲线的蛋白质,包括相分离蛋白质和膜蛋白质,这些蛋白质通常表现出复杂的非S形热稳定性行为。Thermal Tracks可从GitHub免费获得,并在Python中实现,为蛋白质组范围的热分析研究提供了一个可访问且灵活的工具。
摘要:Thermal Tracks is a Python-based statistical framework for analyzing protein thermal stability data that overcomes key limitations of existing thermal proteome profiling (TPP) work-flows. Unlike standard approaches that assume sigmoidal melting curves and are constrained by empirical null distributions (limiting significant hits to approximately 5 % of data), Thermal Tracks uses Gaussian Process (GP) models with squared-exponential kernels to flexibly model any melting curve shape while generating unbiased null distributions through kernel priors. This framework is particularly valuable for analyzing proteome-wide perturbations that significantly alter protein thermal stability, such as pathway inhibitions, genetic modifications, or environmental stresses, where conventional TPP methods may miss biologically relevant changes due to their statistical constraints. Furthermore, Thermal Tracks excels at analyzing proteins with un-conventional melting profiles, including phase-separating proteins and membrane proteins, which often exhibit complex, non-sigmoidal thermal stability behaviors. Thermal Tracks is freely available from GitHub and is implemented in Python, providing an accessible and flexible tool for proteome-wide thermal profiling studies.


【5】TimeMKG: Knowledge-Infused Causal Reasoning for Multivariate Time Series Modeling
标题:TimeMKG:面向多变量时间序列建模的知识注入因果推理
链接:https://arxiv.org/abs/2508.09630

作者:, Junming Liu, Ding Wang, Yirong Chen, Xuefeng Yan
摘要:多变量时间序列数据通常包括两种不同的模态:变量语义和采样数值观测。传统的时间序列模型将变量视为匿名统计信号,忽略了变量名称和数据描述中嵌入的丰富语义信息。然而,这些文本描述符通常对关键领域知识进行编码,这些知识对于鲁棒性和可解释的建模至关重要。在这里,我们提出了TimeMKG,一个多模态因果推理框架,将时间序列建模从低级信号处理提升到知识推理。TimeMKG采用大型语言模型来解释变量语义,并构建结构化的多变量知识图来捕获变量间的关系。双模态编码器分别对从知识图三元组生成的语义提示和来自历史时间序列的统计模式进行建模。跨模态注意在变量水平上对齐和融合这些表示,将因果先验注入到下游任务中,如预测和分类,提供明确和可解释的先验来指导模型推理。在不同数据集上的实验表明,结合可变级别的知识显着提高了预测性能和泛化能力。
摘要:Multivariate time series data typically comprises two distinct modalities: variable semantics and sampled numerical observations. Traditional time series models treat variables as anonymous statistical signals, overlooking the rich semantic information embedded in variable names and data descriptions. However, these textual descriptors often encode critical domain knowledge that is essential for robust and interpretable modeling. Here we present TimeMKG, a multimodal causal reasoning framework that elevates time series modeling from low-level signal processing to knowledge informed inference. TimeMKG employs large language models to interpret variable semantics and constructs structured Multivariate Knowledge Graphs that capture inter-variable relationships. A dual-modality encoder separately models the semantic prompts, generated from knowledge graph triplets, and the statistical patterns from historical time series. Cross-modality attention aligns and fuses these representations at the variable level, injecting causal priors into downstream tasks such as forecasting and classification, providing explicit and interpretable priors to guide model reasoning. The experiment in diverse datasets demonstrates that incorporating variable-level knowledge significantly improves both predictive performance and generalization.


【6】Over-Squashing in GNNs and Causal Inference of Rewiring Strategies
标题:GNN的过度挤压和重新布线策略的因果推理
链接:https://arxiv.org/abs/2508.09265

作者:ber, Amirali Salehi-Abari
备注:14 pages, 2 figures
摘要:图神经网络(GNN)在推荐系统、材料设计和药物再利用等广泛领域表现出最先进的性能。然而,消息传递GNN遭受过度挤压-来自远程节点的长距离信息的指数压缩-这限制了表达能力。重新布线技术可以缓解这一瓶颈,但由于缺乏直接的经验过度挤压度量,其实际影响尚不清楚。我们提出了一种严格的、以拓扑为中心的方法,用于使用节点对相互敏感度的衰减率来评估节点对之间的过度挤压。然后,我们将这些成对评估扩展到四个图形级别的统计数据(患病率,强度,变异性,极端)。将这些指标与图内因果设计相结合,我们量化了重新布线策略如何影响不同图和节点分类基准的过度挤压。我们广泛的实证分析表明,大多数图分类数据集都受到过度挤压的影响(但程度不同),重新布线有效地减轻了它-尽管减轻的程度及其转化为性能增益的程度因数据集和方法而异。我们还发现,过度挤压在节点分类数据集中不太明显,重新布线通常会增加过度挤压,并且性能变化与过度挤压变化无关。这些研究结果表明,当过度挤压是实质性的和有约束力的纠正时,重新布线是最有益的-而过于激进的重新布线,或重新布线应用于最小的过度挤压图,不太可能有帮助,甚至可能损害性能。我们的即插即用诊断工具可以让从业者在接受任何培训之前决定重新布线是否有可能获得回报。
摘要:Graph neural networks (GNNs) have exhibited state-of-the-art performance across wide-range of domains such as recommender systems, material design, and drug repurposing. Yet message-passing GNNs suffer from over-squashing -- exponential compression of long-range information from distant nodes -- which limits expressivity. Rewiring techniques can ease this bottleneck; but their practical impacts are unclear due to the lack of a direct empirical over-squashing metric. We propose a rigorous, topology-focused method for assessing over-squashing between node pairs using the decay rate of their mutual sensitivity. We then extend these pairwise assessments to four graph-level statistics (prevalence, intensity, variability, extremity). Coupling these metrics with a within-graph causal design, we quantify how rewiring strategies affect over-squashing on diverse graph- and node-classification benchmarks. Our extensive empirical analyses show that most graph classification datasets suffer from over-squashing (but to various extents), and rewiring effectively mitigates it -- though the degree of mitigation, and its translation into performance gains, varies by dataset and method. We also found that over-squashing is less notable in node classification datasets, where rewiring often increases over-squashing, and performance variations are uncorrelated with over-squashing changes. These findings suggest that rewiring is most beneficial when over-squashing is both substantial and corrected with restraint -- while overly aggressive rewiring, or rewiring applied to minimally over-squashed graphs, is unlikely to help and may even harm performance. Our plug-and-play diagnostic tool lets practitioners decide -- before any training -- whether rewiring is likely to pay off.


【7】JustDense: Just using Dense instead of Sequence Mixer for Time Series analysis
标题:JustDense:仅使用Dense而不是Sequence Mixer进行时间序列分析
链接:https://arxiv.org/abs/2508.09153

作者:Park, Yongjae Lee, Daesan Park, Dohee Kim, Hyerim Bae
备注:13 pages ,planning to submit to IEEE BigData 2025
摘要:序列信道混合器作为序列模型的核心机制,已经成为时间序列分析的事实标准。然而,最近的研究质疑复杂序列混合器的必要性,例如注意力机制,表明更简单的架构可以实现相当甚至更好的性能。这表明归因于复杂序列混合器的益处可能反而来自其他架构或优化因素。基于这一观察,我们提出了一个中心问题:共同的序列混频器的时间序列分析是必要的?因此,我们提出了JustDense,一项实证研究,系统地取代了各种完善的TSA模型中的序列混合器与致密层。基于MatrixMixer框架,JustDense将任何序列混合器视为混合矩阵,并将其替换为密集层。这种替代分离了混合操作,为理解其作用提供了清晰的理论基础。因此,我们进行了广泛的实验,29个基准涵盖五个代表性的TSA任务,使用七个国家的最先进的TSA模型,以解决我们的研究问题。结果表明,用致密层代替顺序混合器可产生相当或甚至更好的性能。在专用序列混频器仍然具有优势的情况下,JustDense挑战了TSA中“更深、更复杂的架构本质上更好”的假设。
摘要:Sequence and channel mixers, the core mechanism in sequence models, have become the de facto standard in time series analysis (TSA). However, recent studies have questioned the necessity of complex sequence mixers, such as attention mechanisms, demonstrating that simpler architectures can achieve comparable or even superior performance. This suggests that the benefits attributed to complex sequencemixers might instead emerge from other architectural or optimization factors. Based on this observation, we pose a central question: Are common sequence mixers necessary for time-series analysis? Therefore, we propose JustDense, an empirical study that systematically replaces sequence mixers in various well-established TSA models with dense layers. Grounded in the MatrixMixer framework, JustDense treats any sequence mixer as a mixing matrix and replaces it with a dense layer. This substitution isolates the mixing operation, enabling a clear theoretical foundation for understanding its role. Therefore, we conducted extensive experiments on 29 benchmarks covering five representative TSA tasks using seven state-of-the-art TSA models to address our research question. The results show that replacing sequence mixers with dense layers yields comparable or even superior performance. In the cases where dedicated sequence mixers still offer benefits, JustDense challenges the assumption that "deeper and more complex architectures are inherently better" in TSA.


【8】5G Core Fault Detection and Root Cause Analysis using Machine Learning and Generative AI
标题:使用机器学习和生成式AI进行5G核心故障检测和根本原因分析
链接:https://arxiv.org/abs/2508.09152

作者: R. Isaac, Harish Saradagam, Nallamothu Pardhasaradhi
备注:8 pages, 3 figures and 2 tables. Accepted in Conference on Advances in Communication Networks & Systems (CoaCoNS 2025)
摘要:随着5G网络和技术的出现,确保分组核心流量的完整性和性能至关重要。在网络分析过程中,测试文件(如数据包捕获(PCAP)文件和日志文件)将包含错误(如果系统中存在),必须解决这些错误以获得更好的整体网络性能,如连接强度和切换质量。目前的方法需要大量的人小时来整理测试结果并找到故障。本文提出了一种新型的AI/ML驱动的故障分析(FA)引擎,旨在对PCAP文件中的成功帧和故障帧进行分类,特别是在5G分组核心中。FA引擎使用自然语言处理技术分析网络流量,以识别异常和低效,从而显著减少所需的工作时间并提高效率。FA引擎还建议通过在几个5G分组核心文档上训练的大型语言模型(LLM)使用生成AI来解决问题。引擎使用诸如3GPP标准和关于测试的内部条件的用户文档之类的文档从域角度解释错误的细节。ML模型上的测试结果显示,当使用80-20个分裂对成功和失败的PCAP文件进行训练时,测试数据集的分类精度很高。未来的范围包括扩展AI引擎,以纳入4G网络流量和其他形式的网络数据,如日志文本文件和多模式系统。
摘要:With the advent of 5G networks and technologies, ensuring the integrity and performance of packet core traffic is paramount. During network analysis, test files such as Packet Capture (PCAP) files and log files will contain errors if present in the system that must be resolved for better overall network performance, such as connectivity strength and handover quality. Current methods require numerous person-hours to sort out testing results and find the faults. This paper presents a novel AI/ML-driven Fault Analysis (FA) Engine designed to classify successful and faulty frames in PCAP files, specifically within the 5G packet core. The FA engine analyses network traffic using natural language processing techniques to identify anomalies and inefficiencies, significantly reducing the effort time required and increasing efficiency. The FA Engine also suggests steps to fix the issue using Generative AI via a Large Language Model (LLM) trained on several 5G packet core documents. The engine explains the details of the error from the domain perspective using documents such as the 3GPP standards and user documents regarding the internal conditions of the tests. Test results on the ML models show high classification accuracy on the test dataset when trained with 80-20 splits for the successful and failed PCAP files. Future scopes include extending the AI engine to incorporate 4G network traffic and other forms of network data, such as log text files and multimodal systems.


【9】MoLAN: A Unified Modality-Aware Noise Dynamic Editing Framework for Multimodal Sentiment Analysis
标题:MoLAN:用于多模式情绪分析的统一模式感知噪音动态编辑框架
链接:https://arxiv.org/abs/2508.09145

作者:, Yongkang Liu, Dexian Cai, Shi Feng, Xiaocui Yang, Daling Wang, Yifei Zhang
摘要:多模态情感分析旨在整合来自各种模态的信息,如音频,视觉和文本,以进行互补预测。然而,它经常与不相关或误导性的视觉和听觉信息作斗争。大多数现有的方法通常处理整个模态信息(例如,整个图像、音频片段或文本段落)作为用于特征增强或去噪的独立单元。它们往往以丢失关键信息的风险来抑制冗余和噪声信息。为了解决这一挑战,我们提出了MoLAN,一个统一的ModaLity感知噪声动态编辑框架。具体来说,MoLAN通过将每个模态的特征划分为多个块来执行模态感知块。然后,根据其噪声水平和语义相关性,为每个块动态分配不同的去噪强度,从而实现细粒度的噪声抑制,同时保留基本的多模态信息。值得注意的是,MoLAN是一个统一而灵活的框架,可以无缝集成到各种多模态模型中。在此框架的基础上,我们进一步介绍了MoLAN+,一种新的多模态情感分析方法。在五个模型和四个数据集上的实验证明了MoLAN框架的广泛有效性。广泛的评估表明,MoLAN+达到了最先进的性能。该代码可在https://github.com/betterfly123/MoLAN-Framework上公开获得。
摘要:Multimodal Sentiment Analysis aims to integrate information from various modalities, such as audio, visual, and text, to make complementary predictions. However, it often struggles with irrelevant or misleading visual and auditory information. Most existing approaches typically treat the entire modality information (e.g., a whole image, audio segment, or text paragraph) as an independent unit for feature enhancement or denoising. They often suppress the redundant and noise information at the risk of losing critical information. To address this challenge, we propose MoLAN, a unified ModaLity-aware noise dynAmic editiNg framework. Specifically, MoLAN performs modality-aware blocking by dividing the features of each modality into multiple blocks. Each block is then dynamically assigned a distinct denoising strength based on its noise level and semantic relevance, enabling fine-grained noise suppression while preserving essential multimodal information. Notably, MoLAN is a unified and flexible framework that can be seamlessly integrated into a wide range of multimodal models. Building upon this framework, we further introduce MoLAN+, a new multimodal sentiment analysis approach. Experiments across five models and four datasets demonstrate the broad effectiveness of the MoLAN framework. Extensive evaluations show that MoLAN+ achieves the state-of-the-art performance. The code is publicly available at https://github.com/betterfly123/MoLAN-Framework.


【10】Forecasting Binary Economic Events in Modern Mercantilism: Traditional methodologies coupled with PCA and K-means Quantitative Analysis of Qualitative Sentimental Data
标题:现代重商主义中的二元经济事件预测:传统方法与PCA和K均值定性情感数据的定量分析相结合
链接:https://arxiv.org/abs/2508.09243

作者: Kot
摘要:本文考察了现代重商主义,其特征是经济民族主义的兴起,战略技术脱钩和地缘政治分裂,作为一个破坏性的转变,从1945年后的全球化范式。它将主成分分析(PCA)应用于768维SBERT生成的策划新闻文章的语义嵌入,以提取正交潜在因素,区分与保护主义,技术主权和集团重组有关的二元事件结果。主成分负载分析可以识别驱动分类性能的关键语义特征,从而增强可解释性和预测准确性。该方法提供了一个可扩展的、数据驱动的框架,用于通过高维文本分析来定量跟踪新兴mercurylist动态
摘要 :This paper examines Modern Mercantilism, characterized by rising economic nationalism, strategic technological decoupling, and geopolitical fragmentation, as a disruptive shift from the post-1945 globalization paradigm. It applies Principal Component Analysis (PCA) to 768-dimensional SBERT-generated semantic embeddings of curated news articles to extract orthogonal latent factors that discriminate binary event outcomes linked to protectionism, technological sovereignty, and bloc realignments. Analysis of principal component loadings identifies key semantic features driving classification performance, enhancing interpretability and predictive accuracy. This methodology provides a scalable, data-driven framework for quantitatively tracking emergent mercantilist dynamics through high-dimensional text analytics


检测相关(4篇)

【1】Enhance the machine learning algorithm performance in phishing detection with keyword features
标题:利用关键词功能增强网络钓鱼检测中的机器学习算法性能
链接:https://arxiv.org/abs/2508.09765

作者:ang
备注:None
摘要:最近,我们可以观察到网络钓鱼攻击的显著增加。在典型的网络钓鱼攻击中,攻击者建立一个看起来与合法网站相似的恶意网站,以获取最终用户的信息。这可能会导致敏感信息的泄漏和最终用户的经济损失。为了避免此类攻击,早期检测这些网站的URL是至关重要和必要的。以前的研究人员已经提出了许多机器学习算法来区分钓鱼URL和合法URL。在本文中,我们想从特征选择的角度来增强这些机器学习算法。我们提出了一种新的方法,将关键字功能与传统的功能。将该方法应用于多种传统的机器学习算法,实验结果表明该方法是有效的。平均而言,对于大型数据集,该方法可以将分类错误减少30%。此外,它的增强对于小数据集更显着。此外,该方法从URL中提取信息,并且不依赖于第三方服务提供的附加信息。使用我们提出的方法的机器学习算法的最佳结果已经达到了99.68%的准确率。
摘要:Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.


【2】Anomaly Detection for IoT Global Connectivity
标题:物联网全球连接异常检测
链接:https://arxiv.org/abs/2508.09660

作者:ña Iglesias, Carlos Segura Perales, Stefan Geißler, Diego Perino, Andra Lutu
摘要:物联网(IoT)应用提供商依赖移动网络运营商(MNO)和漫游基础设施在全球范围内提供服务。在这个复杂的生态系统中,端到端通信路径贯穿多个实体,保证通信可用性和可靠性变得越来越具有挑战性。此外,大多数平台运营商对沟通问题采取被动的态度,只有在事件变得严重之后才对用户投诉做出回应,从而影响了服务质量。本文介绍了我们在设计和部署ANCHOR方面的经验-这是一种用于大型全球漫游平台的物联网连接服务的无监督异常检测解决方案。ANCHOR通过过滤大量数据来帮助工程师识别潜在的问题客户端(即,那些存在影响其多个物联网设备的连接问题的人),在服务受到严重影响之前实现主动解决问题。我们首先描述我们运营的物联网连接提供商的物联网服务、基础设施和网络可见性。其次,我们描述了在这个平台上设计无监督异常检测解决方案的主要挑战和操作要求。根据这些指导方针,我们提出了不同的统计规则,以及基于被动信令流量的物联网垂直异常检测的机器和深度学习模型。我们描述了我们与运营团队在运营平台上设计和评估我们的解决方案时所遵循的步骤,并报告了对运营物联网客户的评估。
摘要:Internet of Things (IoT) application providers rely on Mobile Network Operators (MNOs) and roaming infrastructures to deliver their services globally. In this complex ecosystem, where the end-to-end communication path traverses multiple entities, it has become increasingly challenging to guarantee communication availability and reliability. Further, most platform operators use a reactive approach to communication issues, responding to user complaints only after incidents have become severe, compromising service quality. This paper presents our experience in the design and deployment of ANCHOR -- an unsupervised anomaly detection solution for the IoT connectivity service of a large global roaming platform. ANCHOR assists engineers by filtering vast amounts of data to identify potential problematic clients (i.e., those with connectivity issues affecting several of their IoT devices), enabling proactive issue resolution before the service is critically impacted. We first describe the IoT service, infrastructure, and network visibility of the IoT connectivity provider we operate. Second, we describe the main challenges and operational requirements for designing an unsupervised anomaly detection solution on this platform. Following these guidelines, we propose different statistical rules, and machine- and deep-learning models for IoT verticals anomaly detection based on passive signaling traffic. We describe the steps we followed working with the operational teams on the design and evaluation of our solution on the operational platform, and report an evaluation on operational IoT customers.


【3】Detection of Odor Presence via Deep Neural Networks
标题:基于深度神经网络的气味检测
链接:https://arxiv.org/abs/2508.09264

作者:sanloo, Ali Zareh, Mehmet Kemal Özdemir
摘要:气味检测是食品安全、环境监测、医疗诊断等许多领域的基础。目前为气味检测开发的人工传感器与复杂的混合物斗争,而非侵入性记录缺乏可靠的单次试验保真度。为了开发一个通用的气味检测系统,在这项研究中,我们提出了一个初步的工作,我们的目标是测试两个假设:(i)局部场电位(LFPs)的光谱特征是足够强大的单次试验气味检测和(ii)信号从嗅球单独是足够的。为了测试两个假设,我们提出了一个互补的一维卷积网络(ResCNN和AttentionCNN)的集合,它可以从多通道嗅球LFP中解码气味的存在。在来自7只清醒小鼠的2,349次试验中进行了测试,我们最终的集成模型支持这两种假设,平均准确率为86.6%,F1得分为81.0%,AUC为0.9247,大大优于以前的基准。此外,t-SNE可视化证实了我们的框架捕获了具有生物学意义的签名。这些发现确立了对细胞外LFP气味的存在进行稳健的单次试验检测的可行性,并证明了深度学习模型在更深入地了解嗅觉表征方面的潜力。
摘要:Odor detection underpins food safety, environmental monitoring, medical diagnostics, and many more fields. The current artificial sensors developed for odor detection struggle with complex mixtures while non-invasive recordings lack reliable single-trial fidelity. To develop a general system for odor detection, in this study we present a preliminary work where we aim to test two hypotheses: (i) that spectral features of local field potentials (LFPs) are sufficient for robust single-trial odor detection and (ii) that signals from the olfactory bulb alone are adequate. To test two hypotheses, we propose an ensemble of complementary one-dimensional convolutional networks (ResCNN and AttentionCNN) that decodes the presence of odor from multichannel olfactory bulb LFPs. Tested on 2,349 trials from seven awake mice, our final ensemble model supports both hypotheses, achieving a mean accuracy of 86.6%, an F1-score of 81.0%, and an AUC of 0.9247, substantially outperforming previous benchmarks. In addition, the t-SNE visualization confirms that our framework captures biologically significant signatures. These findings establish the feasibility of robust single-trial detection of the presence of odor from extracellular LFPs, as well as demonstrate the potential of deep learning models to provide a deeper understanding of olfactory representations.


【4】Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative
标题:Fake-Mamba:使用双向Mamba作为自我注意替代方案的实时语音深度伪造检测
链接:https://arxiv.org/abs/2508.09294

作者:Zimo Zhu, Wenxin Zhang, Yi-Cheng Lin, Tomi Kinnunen
备注:Accepted at IEEE ASRU 2025
摘要:语音合成的进步加剧了安全威胁,激发了实时深度伪造检测研究。我们研究双向Mamba是否可以作为Self-Attention检测合成语音的竞争替代方案。我们的解决方案Fake-Mamba将XLSR前端与双向Mamba集成,以捕获本地和全局工件。我们的核心创新引入了三种高效编码器:TransBiMamba、ConBiMamba和PN-BiMamba。利用XLSR丰富的语言表示,PN-BiMamba可以有效地捕捉合成语音的微妙线索。在ASVspoof 21 LA、21 DF和In-The-Wild基准测试中,Fake-Mamba分别达到0.97%、1.74%和5.85%的EER,这代表了相对于SOTA模型XLSR-Conformer和XLSR-Mamba的实质性相对增益。该框架保持跨话语长度的实时推理,表现出很强的泛化能力和实际可行性。该代码可在https://github.com/xuanxixi/Fake-Mamba上获得。
摘要:Advances in speech synthesis intensify security threats, motivating real-time deepfake detection research. We investigate whether bidirectional Mamba can serve as a competitive alternative to Self-Attention in detecting synthetic speech. Our solution, Fake-Mamba, integrates an XLSR front-end with bidirectional Mamba to capture both local and global artifacts. Our core innovation introduces three efficient encoders: TransBiMamba, ConBiMamba, and PN-BiMamba. Leveraging XLSR's rich linguistic representations, PN-BiMamba can effectively capture the subtle cues of synthetic speech. Evaluated on ASVspoof 21 LA, 21 DF, and In-The-Wild benchmarks, Fake-Mamba achieves 0.97%, 1.74%, and 5.85% EER, respectively, representing substantial relative gains over SOTA models XLSR-Conformer and XLSR-Mamba. The framework maintains real-time inference across utterance lengths, demonstrating strong generalization and practical viability. The code is available at https://github.com/xuanxixi/Fake-Mamba.


分类|识别(2篇)

【1】A Unified Contrastive-Generative Framework for Time Series Classification
标题:时间序列分类的统一对比生成框架
链接:https://arxiv.org/abs/2508.09451

作者: Azadeh Alavi, Minyi Li, Xiang Zhang
摘要:多变量时间序列的自监督学习(SSL)主要包括两种范式:擅长实例判别的对比方法和对数据分布建模的生成方法。虽然它们各自有效,但其互补潜力仍未得到开发。我们提出了一个对比生成时间序列框架(CoGenT),第一个框架,统一这些范式,通过联合对比生成优化。CoGenT解决了这两种方法的基本局限性:它克服了对比学习对时间数据中高类内相似性的敏感性,同时减少了生成方法对大型数据集的依赖。我们在六个不同的时间序列数据集上评估CoGenT。结果显示出一致的改进,与独立的Simploy和MAE相比,F1分别获得了高达59.2%和14.27%的收益。我们的分析表明,混合目标保留了区分能力,同时获得生成鲁棒性。这些研究结果建立了一个基础的混合SSL在时间域。我们将很快发布代码。
摘要:Self-supervised learning (SSL) for multivariate time series mainly includes two paradigms: contrastive methods that excel at instance discrimination and generative approaches that model data distributions. While effective individually, their complementary potential remains unexplored. We propose a Contrastive Generative Time series framework (CoGenT), the first framework to unify these paradigms through joint contrastive-generative optimization. CoGenT addresses fundamental limitations of both approaches: it overcomes contrastive learning's sensitivity to high intra-class similarity in temporal data while reducing generative methods' dependence on large datasets. We evaluate CoGenT on six diverse time series datasets. The results show consistent improvements, with up to 59.2% and 14.27% F1 gains over standalone SimCLR and MAE, respectively. Our analysis reveals that the hybrid objective preserves discriminative power while acquiring generative robustness. These findings establish a foundation for hybrid SSL in temporal domains. We will release the code shortly.


【2】FusionEnsemble-Net: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language Recognition
标题:FusionEnsemmble-Net:用于多模式手语识别的基于注意力的时空网络集成
链接:https://arxiv.org/abs/2508.09362

作者: Islam, Md Rezwanul Haque, S M Taslim Uddin Raju, Fakhri Karray
备注:Accepted for the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, USA. 1st MSLR Workshop 2025
摘要:手语在医疗保健通信的准确识别提出了一个重大的挑战,需要框架,可以准确地解释复杂的多模态手势。为了解决这个问题,我们提出了FusionEnsemble-Net,这是一种新的基于注意力的时空网络集成,可以动态融合视觉和运动数据,以提高识别精度。该方法通过四个不同的时空网络同步处理RGB视频和距离多普勒地图雷达模态。对于每个网络,来自两种模态的特征在被馈送到分类器集合之前使用基于注意力的融合模块连续地融合。最后,这四个不同的融合通道的输出组合在一个合奏分类头,从而提高模型的鲁棒性。实验表明,FusionEnsemble-Net优于最先进的方法,在意大利手语的大规模MultiMeDaLIS数据集上的测试准确率为99.44%。我们的研究结果表明,不同的时空网络的合奏,统一的基于注意力的融合,产生一个强大的和准确的框架,复杂的,多模态孤立的手势识别任务。源代码可从以下网址获得:https://github.com/rezwanh001/Multimodal-Isolated-Italian-Sign-Language-Recognition。
摘要:Accurate recognition of sign language in healthcare communication poses a significant challenge, requiring frameworks that can accurately interpret complex multimodal gestures. To deal with this, we propose FusionEnsemble-Net, a novel attention-based ensemble of spatiotemporal networks that dynamically fuses visual and motion data to enhance recognition accuracy. The proposed approach processes RGB video and range Doppler map radar modalities synchronously through four different spatiotemporal networks. For each network, features from both modalities are continuously fused using an attention-based fusion module before being fed into an ensemble of classifiers. Finally, the outputs of these four different fused channels are combined in an ensemble classification head, thereby enhancing the model's robustness. Experiments demonstrate that FusionEnsemble-Net outperforms state-of-the-art approaches with a test accuracy of 99.44% on the large-scale MultiMeDaLIS dataset for Italian Sign Language. Our findings indicate that an ensemble of diverse spatiotemporal networks, unified by attention-based fusion, yields a robust and accurate framework for complex, multimodal isolated gesture recognition tasks. The source code is available at: https://github.com/rezwanh001/Multimodal-Isolated-Italian-Sign-Language-Recognition.


表征(1篇)

【1】Open-Set Fault Diagnosis in Multimode Processes via Fine-Grained Deep Feature Representation
标题:基于细粒度深度特征表示的多模式过程开集故障诊断
链接:https://arxiv.org/abs/2508.09462

作者:g Li, M. Amine Atoui, Xiangshun Li
备注:34 pages, 12 figures
摘要:一个可靠的故障诊断系统不仅要对已知的健康状态进行准确的分类,而且要有效地识别未知的故障。在多模式过程中,属于同一健康状态的样本通常显示多个聚类分布,这使得难以为该状态构建紧凑且准确的决策边界。针对这一问题,提出了一种新的开集故障诊断模型--细粒度聚类拒识网络(FGCRN)。它结合了多尺度深度卷积、双向门控递归单元和时间注意机制来捕获鉴别特征。一个基于距离的损失函数的设计,以提高类内的紧凑性。通过无监督学习构建细粒度特征表示,以揭示每个健康状态的内在结构。采用极值理论对样本特征与其相应的细粒度表示之间的距离进行建模,从而能够有效地识别未知故障。大量的实验证明了所提出的方法的优越性能。
摘要 :A reliable fault diagnosis system should not only accurately classify known health states but also effectively identify unknown faults. In multimode processes, samples belonging to the same health state often show multiple cluster distributions, making it difficult to construct compact and accurate decision boundaries for that state. To address this challenge, a novel open-set fault diagnosis model named fine-grained clustering and rejection network (FGCRN) is proposed. It combines multiscale depthwise convolution, bidirectional gated recurrent unit and temporal attention mechanism to capture discriminative features. A distance-based loss function is designed to enhance the intra-class compactness. Fine-grained feature representations are constructed through unsupervised learning to uncover the intrinsic structures of each health state. Extreme value theory is employed to model the distance between sample features and their corresponding fine-grained representations, enabling effective identification of unknown faults. Extensive experiments demonstrate the superior performance of the proposed method.


3D|3D重建等相关(1篇)

【1】TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
标题:TRACE:从多视图视频中学习3D高斯物理动力学
链接:https://arxiv.org/abs/2508.09811

作者: Ziyang Song, Bo Yang
备注:ICCV 2025. Code and data are available at: this https URL
摘要:在本文中,我们的目标是在没有任何人类标签的情况下,从动态多视图视频中建模3D场景的几何形状,外观和物理信息。通过利用物理信息损失作为软约束或将简单的物理模型集成到神经网络中,现有的工作通常无法学习复杂的运动物理,或者这样做需要额外的标签,如对象类型或掩码。我们提出了一个新的框架命名为TRACE复杂的动态三维场景的运动物理建模。我们的方法的关键新颖之处在于,通过将每个3D点表示为具有空间大小和方向的刚性粒子,我们直接为每个粒子学习平移旋转动力学系统,明确估计一组完整的物理参数来控制粒子随时间的运动。在三个现有的动态数据集和一个新创建的具有挑战性的合成数据集上进行的大量实验表明,我们的方法在未来的帧外推任务中具有超过基线的非凡性能。我们的框架的一个很好的属性是,多个对象或部分可以很容易地分割只是通过聚类学习的物理参数。
摘要:In this paper, we aim to model 3D scene geometry, appearance, and physical information just from dynamic multi-view videos in the absence of any human labels. By leveraging physics-informed losses as soft constraints or integrating simple physics models into neural nets, existing works often fail to learn complex motion physics, or doing so requires additional labels such as object types or masks. We propose a new framework named TRACE to model the motion physics of complex dynamic 3D scenes. The key novelty of our method is that, by formulating each 3D point as a rigid particle with size and orientation in space, we directly learn a translation rotation dynamics system for each particle, explicitly estimating a complete set of physical parameters to govern the particle's motion over time. Extensive experiments on three existing dynamic datasets and one newly created challenging synthetic datasets demonstrate the extraordinary performance of our method over baselines in the task of future frame extrapolation. A nice property of our framework is that multiple objects or parts can be easily segmented just by clustering the learned physical parameters.


优化|敛散性(3篇)

【1】Prototype Training with Dual Pseudo-Inverse and Optimized Hidden Activations
标题:基于对偶伪逆和优化隐藏激活的原型训练
链接:https://arxiv.org/abs/2508.09787

作者:ci
备注:7 pages, 1 table, reproducible, one proof
摘要:我们提出了Proto-PINV+H,这是一种快速训练范式,它将封闭形式的权重计算与一小组合成输入,软标签和关键隐藏激活的基于梯度的优化相结合。在每次迭代中,我们通过两个(或更多个)脊正则化伪逆解重新计算封闭形式的所有权重矩阵,同时仅使用Adam更新原型。因此,可训练的自由度从权重空间转移到数据/激活空间。在MNIST(60 k训练,10 k测试)和Fashion-MNIST(60 k训练,10 k测试)上,我们的方法在3.9s-4.5s内使用大约130 k个可训练参数和在RTX 5060(16 GB)上仅250个epoch,在官方10 k测试集上分别达到97.8%和89.3%的测试准确度。我们提供了一个多层扩展(在每个隐藏阶段的优化激活),可学习的脊参数,可选的PCA/PLS投影,以及将原型矩阵的条件数与泛化联系起来的理论。该方法产生有利的准确性-速度-大小权衡ELM,随机特征脊,和浅MLP通过反向传播训练。
摘要:We present Proto-PINV+H, a fast training paradigm that combines closed-form weight computation with gradient-based optimisation of a small set of synthetic inputs, soft labels, and-crucially-hidden activations. At each iteration we recompute all weight matrices in closed form via two (or more) ridge-regularised pseudo-inverse solves, while updating only the prototypes with Adam. The trainable degrees of freedom are thus shifted from weight space to data/activation space. On MNIST (60k train, 10k test) and Fashion-MNIST (60k train, 10k test), our method reaches 97.8% and 89.3% test accuracy on the official 10k test sets, respectively, in 3.9s--4.5s using approximately 130k trainable parameters and only 250 epochs on an RTX 5060 (16GB). We provide a multi-layer extension (optimised activations at each hidden stage), learnable ridge parameters, optional PCA/PLS projections, and theory linking the condition number of prototype matrices to generalisation. The approach yields favourable accuracy--speed--size trade-offs against ELM, random-feature ridge, and shallow MLPs trained by back-propagation.


【2】Generative Modeling with Multi-Instance Reward Learning for E-commerce Creative Optimization
标题:电子商务创意优化的多实例奖励学习生成建模
链接:https://arxiv.org/abs/2508.09730

作者:u, Yu Li, DingYi Zeng, Lu Wang, Ming Pang, Changping Peng, Zhangang Lin, Ching Law, Jingping Shao
备注:9 pages, 3 figures, conference paper
摘要:在电子商务广告中,选择最具吸引力的创意元素组合(如标题、图像和亮点)对于吸引用户注意力和推动转化至关重要。然而,现有的方法通常单独评估创意组件,无法导航可能的组合的指数大的搜索空间。为了应对这一挑战,我们提出了一个名为GenCO的新型框架,该框架将生成式建模与多实例奖励学习集成在一起。我们统一的两阶段架构首先采用生成模型来有效地产生一组不同的创意组合。这个生成过程通过强化学习进行了优化,使模型能够有效地探索和细化其选择。接下来,为了克服稀疏用户反馈的挑战,多实例学习模型将组合级别的奖励(例如点击)归因于单个创意元素。这允许奖励模型提供更准确的反馈信号,这反过来又引导生成模型创建更有效的组合。在领先的电子商务平台上部署,我们的方法显著增加了广告收入,证明了其实用价值。此外,我们正在发布一个大规模的工业数据集,以促进在这一重要领域的进一步研究。
摘要:In e-commerce advertising, selecting the most compelling combination of creative elements -- such as titles, images, and highlights -- is critical for capturing user attention and driving conversions. However, existing methods often evaluate creative components individually, failing to navigate the exponentially large search space of possible combinations. To address this challenge, we propose a novel framework named GenCO that integrates generative modeling with multi-instance reward learning. Our unified two-stage architecture first employs a generative model to efficiently produce a diverse set of creative combinations. This generative process is optimized with reinforcement learning, enabling the model to effectively explore and refine its selections. Next, to overcome the challenge of sparse user feedback, a multi-instance learning model attributes combination-level rewards, such as clicks, to the individual creative elements. This allows the reward model to provide a more accurate feedback signal, which in turn guides the generative model toward creating more effective combinations. Deployed on a leading e-commerce platform, our approach has significantly increased advertising revenue, demonstrating its practical value. Additionally, we are releasing a large-scale industrial dataset to facilitate further research in this important domain.


【3】Temporal Anchoring in Deepening Embedding Spaces: Event-Indexed Projections, Drift, Convergence, and an Internal Computational Architecture
标题:深化嵌入空间中的时间锚定:事件索引投影、漂移、收敛和内部计算架构
链接:https://arxiv.org/abs/2508.09693

作者:ay, Bugra Kilictas, Hamdi Alakkad
备注:16 pages, 2 figures, 2 tables
摘要:我们开发了一个算子理论框架的嵌入空间中的时间锚定,建模为漂移映射交错的事件索引块最终在仿射投影。我们提供了完整的证明可变块收缩引理(产品的Lipschitz因素),漂移-投影收敛定理与明确的均匀间隙信封,和本体论收敛嵌套仿射锚下的鲁棒性变体。我们形式化的内部Mandarpt计算机(MC)的计算是纯粹由这些运营商定义,并证明了严格的有限运行等价定理(扰动界)。对于注意层,我们给出了一个自包含的证明,softmax是1/2$-Lipschitz在$\ell_2$,并推导出充分的层收缩条件(正交/非正交头)。所有的浮动都精确地放置在所写的地方;手稿只使用纸上的伪代码和附录图。
摘要:We develop an operator-theoretic framework for temporal anchoring in embedding spaces, modeled as drift maps interleaved with event-indexed blocks culminating in affine projections. We provide complete proofs for a variable-block contraction lemma (products of Lipschitz factors), a drift--projection convergence theorem with explicit uniform-gap envelopes, and ontological convergence under nested affine anchors with a robustness variant. We formalize an internal Manuscript Computer (MC) whose computations are defined purely by these operators and prove a rigorous finite-run equivalence theorem (with perturbation bounds). For attention layers, we give a self-contained proof that softmax is $1/2$-Lipschitz in $\ell_2$ and derive sufficient layer-contraction conditions (orthogonal/non-orthogonal heads). All floats are placed exactly where written; the manuscript uses only in-paper pseudocode and appendix figures.


预测|估计(7篇)

【1】RankList -- A Listwise Preference Learning Framework for Predicting Subjective Preferences
标题:RankList --一个预测主观偏好的列表式偏好学习框架
链接:https://arxiv.org/abs/2508.09826

作者:ddy Naini, Fernando Diaz, Carlos Busso
备注:12 pages, 2 figures
摘要:偏好学习在涉及人类主观判断的任务中得到了极大的关注,例如语音情感识别(SER)和图像审美评价。虽然RankNet等成对框架提供了强大的相对偏好建模,但它们本质上局限于局部比较,并且难以捕获全局排名一致性。为了解决这些局限性,我们提出了RankList,一种新的列表偏好学习框架,将RankNet推广到结构化列表级监督。我们的制定明确的本地和非本地的排名约束模型的概率框架内。为了提高训练效率,本文引入了对数和指数近似。我们进一步扩展了RankList与skip-wise比较,使渐进暴露复杂的列表结构,并提高全球排名的保真度。大量的实验表明,我们的方法在不同的方式的优越性。在基准SER数据集(MSP播客,IEMOCAP,BIIC播客)上,与标准列表基线相比,RankList在Kendall的Tau和排名准确性方面实现了一致的改进。我们还使用Artistic Image Aesthetics数据集验证了我们的美学图像排名方法,突出了其广泛的适用性。通过消融和跨域研究,我们表明,RankList不仅提高了域内排名,而且更好地推广了跨数据集。我们的框架提供了一个统一的,可扩展的方法,在主观学习场景中建模有序的偏好。
摘要:Preference learning has gained significant attention in tasks involving subjective human judgments, such as \emph{speech emotion recognition} (SER) and image aesthetic assessment. While pairwise frameworks such as RankNet offer robust modeling of relative preferences, they are inherently limited to local comparisons and struggle to capture global ranking consistency. To address these limitations, we propose RankList, a novel listwise preference learning framework that generalizes RankNet to structured list-level supervision. Our formulation explicitly models local and non-local ranking constraints within a probabilistic framework. The paper introduces a log-sum-exp approximation to improve training efficiency. We further extend RankList with skip-wise comparisons, enabling progressive exposure to complex list structures and enhancing global ranking fidelity. Extensive experiments demonstrate the superiority of our method across diverse modalities. On benchmark SER datasets (MSP-Podcast, IEMOCAP, BIIC Podcast), RankList achieves consistent improvements in Kendall's Tau and ranking accuracy compared to standard listwise baselines. We also validate our approach on aesthetic image ranking using the Artistic Image Aesthetics dataset, highlighting its broad applicability. Through ablation and cross-domain studies, we show that RankList not only improves in-domain ranking but also generalizes better across datasets. Our framework offers a unified, extensible approach for modeling ordered preferences in subjective learning scenarios.


【2】TriForecaster: A Mixture of Experts Framework for Multi-Region Electric Load Forecasting with Tri-dimensional Specialization
标题:TriForecaster:具有三维专业化的多区域电力负荷预测混合专家框架
链接:https://arxiv.org/abs/2508.09753

作者:Zhu, Zhipeng Zeng, Qiming Chen, Linxiao Yang, Peiyuan Liu, Weiqi Chen, Liang Sun
备注:11 pages, 4 figures
摘要:电力负荷预测是电力系统运行、规划和决策的关键。智能电网和电表的兴起提供了从家庭到公共汽车和城市的多粒度级别的更详细和高质量的负载数据。基于华东某省不同城市间相似的负荷模式,本文研究了多区域电力负荷预测(MRELF)问题,目标是在一个大区域内对多个子区域进行准确的短期负荷预测。我们确定了MRELF的三个挑战,包括区域变化,上下文变化和时间变化。为了解决这些问题,我们提出了TriForecaster,一个新的框架,利用混合专家(MoE)的方法在多任务学习(MTL)的范式,以克服这些挑战。TriForecaster具有RegionMixer和Context-Time Specializer(CTSpecializer)层,支持跨区域、上下文和时间维度的专家模型的动态协作和专业化。通过对四个不同粒度的真实MRELF数据集的评估,TriForecaster的平均预测误差降低了22.4%,从而证明了其灵活性和广泛的适用性。特别是在华东地区eForecaster平台上部署TriForecaster后,其实用性得到了充分体现,有效地为17个城市提供了城市级的短期负荷预测,支持人口超过1. 1亿,日用电量超过100千兆瓦时。
摘要:Electric load forecasting is pivotal for power system operation, planning and decision-making. The rise of smart grids and meters has provided more detailed and high-quality load data at multiple levels of granularity, from home to bus and cities. Motivated by similar patterns of loads across different cities in a province in eastern China, in this paper we focus on the Multi-Region Electric Load Forecasting (MRELF) problem, targeting accurate short-term load forecasting for multiple sub-regions within a large region. We identify three challenges for MRELF, including regional variation, contextual variation, and temporal variation. To address them, we propose TriForecaster, a new framework leveraging the Mixture of Experts (MoE) approach within a Multi-Task Learning (MTL) paradigm to overcome these challenges. TriForecaster features RegionMixer and Context-Time Specializer (CTSpecializer) layers, enabling dynamic cooperation and specialization of expert models across regional, contextual, and temporal dimensions. Based on evaluation on four real-world MRELF datasets with varied granularity, TriForecaster outperforms state-of-the-art models by achieving an average forecast error reduction of 22.4\%, thereby demonstrating its flexibility and broad applicability. In particular, the deployment of TriForecaster on the eForecaster platform in eastern China exemplifies its practical utility, effectively providing city-level, short-term load forecasts for 17 cities, supporting a population exceeding 110 million and daily electricity usage over 100 gigawatt-hours.


【3】Multimodal Sheaf-based Network for Glioblastoma Molecular Subtype Prediction
标题:基于多峰线的神经网络用于胶质母细胞瘤分子亚型预测
链接:https://arxiv.org/abs/2508.09717

作者:Idrissova, Islem Rekik
摘要:胶质母细胞瘤是一种高度侵袭性的脑肿瘤,进展速度快。最近的研究表明,胶质母细胞瘤分子亚型分类是有效靶向治疗选择的重要生物标志物。然而,这种分类目前需要侵入性组织提取进行全面的组织病理学分析。结合MRI和组织病理学图像的现有多模态方法是有限的,并且缺乏用于跨模态保留共享结构信息的稳健机制。特别是,基于图形的模型往往无法保留异质图形内的判别特征,并且用于处理缺失或不完整模态数据的结构重建机制在很大程度上未被充分探索。为了解决这些局限性,我们提出了一种新的基于束的框架,用于MRI和组织病理学数据的结构感知和一致融合。我们的模型优于基线方法,并在不完整或缺失数据的情况下表现出鲁棒性,有助于开发用于快速诊断的虚拟活检工具。我们的源代码可在https://github.com/basiralab/MMSN/上获得。
摘要 :Glioblastoma is a highly invasive brain tumor with rapid progression rates. Recent studies have shown that glioblastoma molecular subtype classification serves as a significant biomarker for effective targeted therapy selection. However, this classification currently requires invasive tissue extraction for comprehensive histopathological analysis. Existing multimodal approaches combining MRI and histopathology images are limited and lack robust mechanisms for preserving shared structural information across modalities. In particular, graph-based models often fail to retain discriminative features within heterogeneous graphs, and structural reconstruction mechanisms for handling missing or incomplete modality data are largely underexplored. To address these limitations, we propose a novel sheaf-based framework for structure-aware and consistent fusion of MRI and histopathology data. Our model outperforms baseline methods and demonstrates robustness in incomplete or missing data scenarios, contributing to the development of virtual biopsy tools for rapid diagnostics. Our source code is available at https://github.com/basiralab/MMSN/.


【4】A Lightweight Learned Cardinality Estimation Model
标题:一种轻量级的学习基数估计模型
链接:https://arxiv.org/abs/2508.09602

作者:, Jintao Zhang, Guoliang Li, Jianhua Feng
备注:IEEE Transactions on Knowledge and Data Engineering (TKDE), 2025
摘要:基数估计是数据库管理系统中的一项基本任务,其目的是在不执行查询的情况下准确地预测查询结果。然而,现有技术要么实现低估计准确度,要么导致高推理延迟。同时实现高速度和准确性成为关键的基数估计问题。在本文中,我们提出了一种新的数据驱动的方法,称为代码(覆盖与分解)来解决这个问题。CoDe采用覆盖设计的概念,将表划分为多个较小的重叠段。对于每个段,CoDe利用张量分解来准确地建模其数据分布。此外,CoDe引入了创新算法来为每个查询选择最佳拟合分布,并将它们组合起来以估计最终结果。通过采用多个模型来近似分布,CoDe在有效建模离散分布和确保计算效率方面表现出色。值得注意的是,实验结果表明,我们的方法在基数估计方面取得了显着进步,实现了最先进的估计精度和推理效率。在各种数据集上,CoDe在估计超过一半的查询时实现了绝对准确性。
摘要:Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or incur high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called CoDe (Covering with Decompositions) to address this problem. CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. By employing multiple models to approximate distributions, CoDe excels in effectively modeling discrete distributions and ensuring computational efficiency. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, CoDe achieves absolute accuracy in estimating more than half of the queries.


【5】Online Prediction with Limited Selectivity
标题:选择性有限的在线预测
链接:https://arxiv.org/abs/2508.09592

作者:iu, Mingda Qiao
摘要:选择性预测[Dru13,QV19]模拟了预测者自由决定其预测范围的预测窗口的情景。许多数据统计可以预测到一个非平凡的错误率,没有任何分布假设或专家的意见,但这些结果依赖于预测者可以在任何时候预测。我们介绍了一个模型的预测与有限的选择性(PLS)的预测者可以开始预测只有在一个子集的时间范围。我们研究的最佳预测误差都在一个实例的基础上,并通过一个平均的情况下分析。我们引入了一个复杂的措施,给出了依赖于实例的最佳错误的界限。对于随机生成的PLS实例,这些边界以高概率匹配。
摘要:Selective prediction [Dru13, QV19] models the scenario where a forecaster freely decides on the prediction window that their forecast spans. Many data statistics can be predicted to a non-trivial error rate without any distributional assumptions or expert advice, yet these results rely on that the forecaster may predict at any time. We introduce a model of Prediction with Limited Selectivity (PLS) where the forecaster can start the prediction only on a subset of the time horizon. We study the optimal prediction error both on an instance-by-instance basis and via an average-case analysis. We introduce a complexity measure that gives instance-dependent bounds on the optimal error. For a randomly-generated PLS instance, these bounds match with high probability.


【6】Decentralized Weather Forecasting via Distributed Machine Learning and Blockchain-Based Model Validation
标题:通过分布式机器学习和基于区块链的模型验证进行去中心化天气预测
链接:https://arxiv.org/abs/2508.09299

作者:ar, Aydin Abadi, Basil Aldali, Benito Vincent, Elliot A. J. Hurley, Hotoon Aljazaeri, Jamie Hedley-Cook, Jamie-Lee Bell, Lambert Uwuigbusun, Mujeeb Ahmed, Shishir Nagaraja, Suleiman Sabo, Weaam Alrbeiqi
摘要:天气预报在备灾、农业和资源管理中发挥着至关重要的作用,但目前的集中式预报系统越来越受到安全漏洞、可扩展性有限和易受单点故障的影响。为了应对这些挑战,我们提出了一个分散的天气预报框架,将联邦学习(FL)与区块链技术相结合。FL支持协作模型训练,而不会暴露敏感的本地数据;这种方法增强了隐私性,减少了数据传输开销。同时,以太坊区块链确保了模型更新的透明和可靠验证。为了进一步增强系统的安全性,我们引入了一种基于声誉的投票机制,该机制评估提交模型的可信度,同时利用星际文件系统(IPFS)进行有效的链下存储。实验结果表明,我们的方法不仅提高了预测精度,而且还增强了系统的弹性和可扩展性,使其成为一个可行的候选人部署在现实世界中,安全关键的环境。
摘要:Weather forecasting plays a vital role in disaster preparedness, agriculture, and resource management, yet current centralized forecasting systems are increasingly strained by security vulnerabilities, limited scalability, and susceptibility to single points of failure. To address these challenges, we propose a decentralized weather forecasting framework that integrates Federated Learning (FL) with blockchain technology. FL enables collaborative model training without exposing sensitive local data; this approach enhances privacy and reduces data transfer overhead. Meanwhile, the Ethereum blockchain ensures transparent and dependable verification of model updates. To further enhance the system's security, we introduce a reputation-based voting mechanism that assesses the trustworthiness of submitted models while utilizing the Interplanetary File System (IPFS) for efficient off-chain storage. Experimental results demonstrate that our approach not only improves forecasting accuracy but also enhances system resilience and scalability, making it a viable candidate for deployment in real-world, security-critical environments.


【7】Peer Effect Estimation in the Presence of Simultaneous Feedback and Unobserved Confounders
标题:同时反馈和未观察到的混杂因素存在下的同伴效应估计
链接:https://arxiv.org/abs/2508.09154

作者:Du, Jiuyong Li, Lin Liu, Debo Cheng, Thuc.Le
摘要 :在复杂的现实世界的网络,如社交网络,估计同行的因果关系的影响是具有挑战性的,主要是由于同行和未观察到的混杂因素之间的同步反馈。现有的方法要么解决未观察到的混杂因素,而忽略了同时反馈,或帐户的反馈,但在限制性的线性假设,从而无法获得准确的同伴效应估计。在本文中,我们提出了DIG 2 RSI,这是一种新型的深度学习框架,它利用I-G变换(矩阵运算)和2SRI(工具变量或IV技术)来解决同时反馈和未观察到的混淆,同时适应复杂的,非线性的和高维的关系。DIG 2 RSI首先应用I-G变换来解开相互的对等影响,并消除由于同时反馈而引起的偏差。为了处理未观察到的混杂,我们首先从网络数据中构建有效的IV。在2 RSI的第一阶段,我们在这些IV上训练神经网络来预测同伴暴露,并提取残差作为未观察到的混杂因素的代理。在第二阶段,我们拟合了一个单独的神经网络,该神经网络由对抗性神经网络增强,该神经网络将这些残差作为控制函数,并强制学习的表示不包含任何残差混淆信号。深度学习模型在捕捉复杂的非线性关系和对抗性去偏方面的表现力增强了DIG 2 RSI在消除反馈回路和隐藏混杂因素的偏差方面的有效性。我们证明了我们的估计在标准正则性条件下的一致性,确保真正的同行效应的渐近恢复。两个半合成基准和真实世界的数据集的实证结果表明,DIG 2 RSI优于现有的方法。
摘要:Estimating peer causal effects within complex real-world networks such as social networks is challenging, primarily due to simultaneous feedback between peers and unobserved confounders. Existing methods either address unobserved confounders while ignoring the simultaneous feedback, or account for feedback but under restrictive linear assumptions, thus failing to obtain accurate peer effect estimation. In this paper, we propose DIG2RSI, a novel Deep learning framework which leverages I-G transformation (matrix operation) and 2SRI (an instrumental variable or IV technique) to address both simultaneous feedback and unobserved confounding, while accommodating complex, nonlinear and high-dimensional relationships. DIG2RSI first applies the I-G transformation to disentangle mutual peer influences and eliminate the bias due to the simultaneous feedback. To deal with unobserved confounding, we first construct valid IVs from network data. In stage 1 of 2RSI, we train a neural network on these IVs to predict peer exposure, and extract residuals as proxies for the unobserved confounders. In the stage 2, we fit a separate neural network augmented by an adversarial discriminator that incorporates these residuals as a control function and enforces the learned representation to contain no residual confounding signal. The expressive power of deep learning models in capturing complex non-linear relationships and adversarial debiasing enhances the effectiveness of DIG2RSI in eliminating bias from both feedback loops and hidden confounders. We prove consistency of our estimator under standard regularity conditions, ensuring asymptotic recovery of the true peer effect. Empirical results on two semi-synthetic benchmarks and a real-world dataset demonstrate that DIG2RSI outperforms existing approaches.


其他神经网络|深度学习|模型|建模(25篇)

【1】Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
标题:噪音超网络:扩散模型中的摊销测试时间计算
链接:https://arxiv.org/abs/2508.09968

作者:ng, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata
备注:Project page: this https URL
摘要:测试时间缩放的新范式在大型语言模型(LLM)(例如推理模型)和生成视觉模型中取得了显着的突破,允许模型在推理过程中分配额外的计算,以有效地解决日益复杂的问题。尽管这种方法的改进,一个重要的限制出现:计算时间的大幅增加,使该过程缓慢,不切实际的许多应用程序。鉴于这种模式的成功及其日益增长的使用,我们试图保持其优势,同时避免推理开销。在这项工作中,我们提出了一个解决方案,在后训练过程中将测试时缩放知识集成到模型中的关键问题。具体来说,我们将扩散模型中的奖励引导测试时间噪声优化替换为调制初始输入噪声的噪声超网络。我们提出了一个理论上的接地框架,学习这种奖励倾斜分布蒸馏发电机,通过一个易于处理的噪声空间目标,保持保真度的基础模型,同时优化所需的特性。我们表明,我们的方法恢复了显式的测试时间优化的计算成本的一小部分的质量收益的很大一部分。代码可在https://github.com/ExplainableML/HyperNoise上获得
摘要:The new paradigm of test-time scaling has yielded remarkable breakthroughs in Large Language Models (LLMs) (e.g. reasoning models) and in generative vision models, allowing models to allocate additional computation during inference to effectively tackle increasingly complex problems. Despite the improvements of this approach, an important limitation emerges: the substantial increase in computation time makes the process slow and impractical for many applications. Given the success of this paradigm and its growing usage, we seek to preserve its benefits while eschewing the inference overhead. In this work we propose one solution to the critical problem of integrating test-time scaling knowledge into a model during post-training. Specifically, we replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise. We propose a theoretically grounded framework for learning this reward-tilted distribution for distilled generators, through a tractable noise-space objective that maintains fidelity to the base model while optimizing for desired characteristics. We show that our approach recovers a substantial portion of the quality gains from explicit test-time optimization at a fraction of the computational cost. Code is available at https://github.com/ExplainableML/HyperNoise


【2】Stable Diffusion Models are Secretly Good at Visual In-Context Learning
标题:稳定扩散模型秘密擅长视觉内上下文学习
链接:https://arxiv.org/abs/2508.09949

作者:orloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara, Ali Shafahi, Amin Ghiasi, Charan Prakash, Reza Ardekani
备注:Accepted to ICCV 2025
摘要:自然语言处理(NLP)中的大型语言模型(LLM)已经证明了上下文学习(ICL)的巨大潜力-利用几组示例提示来适应各种任务而无需显式更新模型权重的能力。ICL最近被探索用于计算机视觉任务,并取得了有希望的早期成果。这些方法涉及专门的培训和/或额外的数据,使这一过程复杂化,并限制了其普遍性。在这项工作中,我们表明,现成的稳定扩散模型可以重新用于视觉上下文学习(V-ICL)。具体来说,我们制定了一个在地方的注意力重新计算内的自我注意力层的稳定扩散体系结构,明确纳入查询和示例提示之间的上下文。在没有任何额外微调的情况下,我们证明了这种重新设计的稳定扩散模型能够适应六种不同的任务:前景分割,单对象检测,语义分割,关键点检测,边缘检测和彩色化。例如,所提出的方法将Pascal-5i数据集上前景分割任务的平均交集(mIoU)分别比最近的方法(如Visual Recommending和IMProv)提高了8.9%和3.2%。此外,我们表明,所提出的方法是能够有效地利用多个提示,通过集成推断任务更好,进一步提高性能。
摘要:Large language models (LLM) in natural language processing (NLP) have demonstrated great potential for in-context learning (ICL) -- the ability to leverage a few sets of example prompts to adapt to various tasks without having to explicitly update the model weights. ICL has recently been explored for computer vision tasks with promising early outcomes. These approaches involve specialized training and/or additional data that complicate the process and limit its generalizability. In this work, we show that off-the-shelf Stable Diffusion models can be repurposed for visual in-context learning (V-ICL). Specifically, we formulate an in-place attention re-computation within the self-attention layers of the Stable Diffusion architecture that explicitly incorporates context between the query and example prompts. Without any additional fine-tuning, we show that this repurposed Stable Diffusion model is able to adapt to six different tasks: foreground segmentation, single object detection, semantic segmentation, keypoint detection, edge detection, and colorization. For example, the proposed approach improves the mean intersection over union (mIoU) for the foreground segmentation task on Pascal-5i dataset by 8.9% and 3.2% over recent methods such as Visual Prompting and IMProv, respectively. Additionally, we show that the proposed method is able to effectively leverage multiple prompts through ensembling to infer the task better and further improve the performance.


【3】Residual Reservoir Memory Networks
标题:剩余水库存储网络
链接:https://arxiv.org/abs/2508.09925

作者:nna, Andrea Ceni, Claudio Gallicchio
备注:7 pages, 6 figures, accepted at IJCNN 2025
摘要:我们在水库计算(RC)范式中引入了一类新的未经训练的递归神经网络(RNN),称为残差水库记忆网络(ResRMN)。ResRMN将线性存储库与非线性存储库相结合,其中后者基于沿时间维度的剩余正交连接,用于增强输入的长期传播。由此产生的水库状态动态研究通过镜头的线性稳定性分析,我们调查不同的配置的时间剩余连接。所提出的方法进行了经验评估的时间序列和像素级的1-D分类任务。我们的实验结果突出了所提出的方法比其他传统的RC模型的优势。
摘要:We introduce a novel class of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) paradigm, called Residual Reservoir Memory Networks (ResRMNs). ResRMN combines a linear memory reservoir with a non-linear reservoir, where the latter is based on residual orthogonal connections along the temporal dimension for enhanced long-term propagation of the input. The resulting reservoir state dynamics are studied through the lens of linear stability analysis, and we investigate diverse configurations for the temporal residual connections. The proposed approach is empirically assessed on time-series and pixel-level 1-D classification tasks. Our experimental results highlight the advantages of the proposed approach over other conventional RC models.


【4】Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?
标题:小表格数据集的现代神经网络:田间尺度数字土壤制图的新默认值?
链接:https://arxiv.org/abs/2508.09888

作者:v Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller
摘要:在步数学领域,表格机器学习是从遥感和近距离土壤传感数据预测土壤性质的主要方法,形成了数字土壤制图的核心组成部分。在现场规模,这种预测土壤建模(PSM)的任务通常是由小的训练样本大小和土壤光谱中的高特征-样本比的限制。传统上,这些条件对于传统的深度学习方法来说是具有挑战性的。经典的机器学习算法,特别是基于树的模型(如随机森林)和线性模型(如偏最小二乘回归),长期以来一直是现场规模PSM的默认选择。人工神经网络(ANN)的最新进展的表格数据挑战这一观点,但其适用于现场规模的PSM尚未得到证明。我们介绍了一个全面的基准,评估国家的最先进的人工神经网络架构,包括最新的多层感知器(MLP)为基础的模型(TabM,RealMLP),注意力为基础的Transformer变体(FT-Transformer,ExcelFormer,T2 G-Former,AMFormer),检索增强的方法(TabR,ModernNCA),和上下文学习基础模型(TabPFN)。我们的评估包括31个田间和农场规模的数据集,包含30至460个样本和三个关键的土壤性质:土壤有机质或土壤有机碳,pH值和粘土含量。我们的研究结果表明,现代人工神经网络在大多数任务上的表现始终优于经典方法,这表明深度学习已经足够成熟,可以克服经典机器学习在PSM中的长期主导地位。值得注意的是,TabPFN提供了最强的整体性能,在不同的条件下表现出鲁棒性。因此,我们建议采用现代ANN进行现场规模的PSM,并建议TabPFN作为每个儿科医生工具包中的新默认选择。
摘要:In the field of pedometrics, tabular machine learning is the predominant method for predicting soil properties from remote and proximal soil sensing data, forming a central component of digital soil mapping. At the field-scale, this predictive soil modeling (PSM) task is typically constrained by small training sample sizes and high feature-to-sample ratios in soil spectroscopy. Traditionally, these conditions have proven challenging for conventional deep learning methods. Classical machine learning algorithms, particularly tree-based models like Random Forest and linear models such as Partial Least Squares Regression, have long been the default choice for field-scale PSM. Recent advances in artificial neural networks (ANN) for tabular data challenge this view, yet their suitability for field-scale PSM has not been proven. We introduce a comprehensive benchmark that evaluates state-of-the-art ANN architectures, including the latest multilayer perceptron (MLP)-based models (TabM, RealMLP), attention-based transformer variants (FT-Transformer, ExcelFormer, T2G-Former, AMFormer), retrieval-augmented approaches (TabR, ModernNCA), and an in-context learning foundation model (TabPFN). Our evaluation encompasses 31 field- and farm-scale datasets containing 30 to 460 samples and three critical soil properties: soil organic matter or soil organic carbon, pH, and clay content. Our results reveal that modern ANNs consistently outperform classical methods on the majority of tasks, demonstrating that deep learning has matured sufficiently to overcome the long-standing dominance of classical machine learning for PSM. Notably, TabPFN delivers the strongest overall performance, showing robustness across varying conditions. We therefore recommend the adoption of modern ANNs for field-scale PSM and propose TabPFN as the new default choice in the toolkit of every pedometrician.


【5】A Machine Learning Approach to Predict Biological Age and its Longitudinal Drivers
标题:预测生物年龄及其纵向驱动因素的机器学习方法
链接:https://arxiv.org/abs/2508.09747

作者:nbayeva, Yulong Li, Yutong Xie, Imran Razzak
摘要:预测个体的衰老轨迹是预防医学和生物信息学的核心挑战。虽然机器学习模型可以从生物标志物中预测实际年龄,但它们通常无法捕捉衰老过程的动态,纵向性质。在这项工作中,我们开发并验证了一个机器学习管道,使用两个不同时间段(2019-2020年和2021-2022年)的纵向队列数据来预测年龄。我们证明了仅使用静态横截面生物标志物的模型在推广到未来时间点时具有有限的预测能力。然而,通过设计新的功能,明确捕捉关键生物标志物随时间的变化率(斜率),我们显着提高了模型性能。我们最终的LightGBM模型,在最初的数据波上训练,成功地预测了随后一波的年龄,具有很高的准确性(男性R^2 = 0.515$,女性R^2 = 0.498$),显著优于传统的线性模型和其他基于树的集合。对我们成功模型的SHAP分析显示,工程斜坡特征是最重要的预测因素之一,这突出表明,个人的健康轨迹,而不仅仅是他们的静态健康快照,是生物年龄的关键决定因素。我们的框架为动态跟踪患者健康轨迹的临床工具铺平了道路,从而为年龄相关疾病提供早期干预和个性化预防策略。
摘要:Predicting an individual's aging trajectory is a central challenge in preventative medicine and bioinformatics. While machine learning models can predict chronological age from biomarkers, they often fail to capture the dynamic, longitudinal nature of the aging process. In this work, we developed and validated a machine learning pipeline to predict age using a longitudinal cohort with data from two distinct time periods (2019-2020 and 2021-2022). We demonstrate that a model using only static, cross-sectional biomarkers has limited predictive power when generalizing to future time points. However, by engineering novel features that explicitly capture the rate of change (slope) of key biomarkers over time, we significantly improved model performance. Our final LightGBM model, trained on the initial wave of data, successfully predicted age in the subsequent wave with high accuracy ($R^2 = 0.515$ for males, $R^2 = 0.498$ for females), significantly outperforming both traditional linear models and other tree-based ensembles. SHAP analysis of our successful model revealed that the engineered slope features were among the most important predictors, highlighting that an individual's health trajectory, not just their static health snapshot, is a key determinant of biological age. Our framework paves the way for clinical tools that dynamically track patient health trajectories, enabling early intervention and personalized prevention strategies for age-related diseases.


【6】Improving ARDS Diagnosis Through Context-Aware Concept Bottleneck Models
标题:通过上下文感知概念瓶颈模型改进ARD诊断
链接:https://arxiv.org/abs/2508.09719

作者:ain, Ritam Majumdar, Nikita Narayanan, Dominic Marshall, Sonali Parbhoo
备注:32 pages, 7 figures, accepted at Machine Learning for Healthcare Conference (MLHC) 2025
摘要:大型的、公开的临床数据集已经成为了解疾病异质性和探索个性化治疗的新资源。这些数据集来自最初并非为研究目的收集的数据,因此往往不完整,缺乏关键标签。已经开发了许多人工智能工具来回顾性地标记这些数据集,例如通过执行疾病分类;然而,它们通常具有有限的可解释性。以前的工作试图使用概念瓶颈模型(CBMs)来解释预测,CBMs学习可解释的概念,这些概念映射到更高层次的临床想法,促进人类评估。然而,这些模型往往会遇到性能限制时,概念无法充分解释或表征的任务。我们使用急性呼吸窘迫综合征(ARDS)的识别作为一个具有挑战性的测试案例,以证明将上下文信息从临床笔记,以提高CBM性能的价值。我们的方法利用大型语言模型(LLM)来处理临床笔记并生成额外的概念,与现有方法相比,性能提高了10%。此外,它有助于学习更全面的概念,从而降低信息泄露的风险和对虚假捷径的依赖,从而改善ARDS的表征。
摘要 :Large, publicly available clinical datasets have emerged as a novel resource for understanding disease heterogeneity and to explore personalization of therapy. These datasets are derived from data not originally collected for research purposes and, as a result, are often incomplete and lack critical labels. Many AI tools have been developed to retrospectively label these datasets, such as by performing disease classification; however, they often suffer from limited interpretability. Previous work has attempted to explain predictions using Concept Bottleneck Models (CBMs), which learn interpretable concepts that map to higher-level clinical ideas, facilitating human evaluation. However, these models often experience performance limitations when the concepts fail to adequately explain or characterize the task. We use the identification of Acute Respiratory Distress Syndrome (ARDS) as a challenging test case to demonstrate the value of incorporating contextual information from clinical notes to improve CBM performance. Our approach leverages a Large Language Model (LLM) to process clinical notes and generate additional concepts, resulting in a 10% performance gain over existing methods. Additionally, it facilitates the learning of more comprehensive concepts, thereby reducing the risk of information leakage and reliance on spurious shortcuts, thus improving the characterization of ARDS.


【7】Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data
标题:个性化产品搜索排名:具有表格和非表格数据的多任务学习方法
链接:https://arxiv.org/abs/2508.09636

作者:Morishetti, Abhay Kumar, Jonathan Scott, Kaushiki Nag, Gunjan Sharma, Shanu Vashishtha, Rahul Sridhar, Rohit Chatter, Kannan Achan
备注:17 pages, 2 figures, The Pacific Rim International Conference on Artificial Intelligence (PRICAI-2025) Conference
摘要:在本文中,我们提出了一种新的模型架构,优化个性化的产品搜索排名使用多任务学习(MTL)框架。我们的方法独特地集成了表格和非表格数据,利用预先训练的TinyBERT模型进行语义嵌入,并采用新颖的采样技术来捕获不同的客户行为。我们根据几个基线评估我们的模型,包括XGBoost,TabNet,FT-Transformer,DCN-V2和MMoE,重点关注它们处理混合数据类型和优化个性化排名的能力。此外,我们提出了一个可扩展的相关性标记机制的基础上点击率,点击位置和语义相似性,提供了一种替代传统的人类注释的标签。实验结果表明,在多任务学习范式中,将非表格数据与先进的嵌入技术相结合,显著提高了模型的性能。消融研究进一步强调了整合相关性标签、微调TinyBERT层和TinyBERT查询-产品嵌入交互的好处。这些结果表明,我们的方法在实现改进的个性化产品搜索排名的有效性。
摘要:In this paper, we present a novel model architecture for optimizing personalized product search ranking using a multi-task learning (MTL) framework. Our approach uniquely integrates tabular and non-tabular data, leveraging a pre-trained TinyBERT model for semantic embeddings and a novel sampling technique to capture diverse customer behaviors. We evaluate our model against several baselines, including XGBoost, TabNet, FT-Transformer, DCN-V2, and MMoE, focusing on their ability to handle mixed data types and optimize personalized ranking. Additionally, we propose a scalable relevance labeling mechanism based on click-through rates, click positions, and semantic similarity, offering an alternative to traditional human-annotated labels. Experimental results show that combining non-tabular data with advanced embedding techniques in multi-task learning paradigm significantly enhances model performance. Ablation studies further underscore the benefits of incorporating relevance labels, fine-tuning TinyBERT layers, and TinyBERT query-product embedding interactions. These results demonstrate the effectiveness of our approach in achieving improved personalized product search ranking.


【8】Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges
标题:通过世界模型和大型人工智能边缘通用智能:基础知识、解决方案和挑战
链接:https://arxiv.org/abs/2508.09561

作者: Zhao, Guangyuan Liu, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Zan Li, Xuemin (Sherman)Shen, Zhu Han, Sumei Sun, Chau Yuen, Dong In Kim
备注:21 pages. 9 figures
摘要:边缘通用智能(EGI)代表了边缘计算的变革性发展,其中分布式代理拥有在不同的动态环境中自主感知、推理和行动的能力。这一愿景的核心是世界模型,它充当主动的内部模拟器,不仅预测而且积极想象未来的轨迹,在不确定性下进行推理,并有远见地计划多步行动。这种积极主动的性质使代理能够预测潜在的结果,并在现实世界的交互之前优化决策。虽然之前在机器人和游戏方面的工作已经展示了世界模型的潜力,但它们与EGI无线边缘的集成仍然有待探索。这项调查通过全面分析世界模型如何在边缘增强代理人工智能(AI)系统来弥合这一差距。我们首先研究世界模型的架构基础,包括潜在表示学习,动态建模和基于简化的规划。基于这些核心功能,我们展示了它们在EGI场景中的主动应用,如车载网络,无人机(UAV)网络,物联网(IoT)系统和网络功能虚拟化,从而强调它们如何在延迟,能源和隐私约束下增强优化。然后,我们探索它们与基础模型和数字孪生的协同作用,将世界模型定位为EGI的认知支柱。最后,我们强调了开放的挑战,如安全保障,有效的培训和部署的约束,并概述了未来的研究方向。该调查为实现下一代智能自主边缘系统提供了概念基础和实用路线图。
摘要:Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously across diverse, dynamic environments. Central to this vision are world models, which act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi-step actions with foresight. This proactive nature allows agents to anticipate potential outcomes and optimize decisions ahead of real-world interactions. While prior works in robotics and gaming have showcased the potential of world models, their integration into the wireless edge for EGI remains underexplored. This survey bridges this gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge. We first examine the architectural foundations of world models, including latent representation learning, dynamics modeling, and imagination-based planning. Building on these core capabilities, we illustrate their proactive applications across EGI scenarios such as vehicular networks, unmanned aerial vehicle (UAV) networks, the Internet of Things (IoT) systems, and network functions virtualization, thereby highlighting how they can enhance optimization under latency, energy, and privacy constraints. We then explore their synergy with foundation models and digital twins, positioning world models as the cognitive backbone of EGI. Finally, we highlight open challenges, such as safety guarantees, efficient training, and constrained deployment, and outline future research directions. This survey provides both a conceptual foundation and a practical roadmap for realizing the next generation of intelligent, autonomous edge systems.


【9】Decentralized Rank Scheduling for Energy-Constrained Multi-Task Federated Fine-Tuning in Edge-Assisted IoV Networks
标题:边缘辅助IoV网络中能量约束多任务联邦微调的分散排序调度
链接:https://arxiv.org/abs/2508.09532

作者:eng, Jianqiang Zhong, Jiayi Liu, Xiaoxi Zhang
摘要:联邦微调已经成为一种很有前途的方法,用于使基础模型(FM)适应边缘环境中的各种下游任务。在车联网(IoV)系统中,由于客户端移动性、异构资源和间歇性连接,实现高效且低延迟的多任务自适应尤其具有挑战性。本文提出了一种分层联邦微调框架,协调路边单元(RSU)和车辆,以支持跨动态车联网场景的资源感知和移动性弹性学习。利用低秩自适应(LoRA),我们引入了一个分散的,能量感知的秩自适应机制,制定为一个受约束的多臂强盗问题。提出了一种新的UCB-DUAL算法,该算法能够在每个任务的能量预算下进行自适应探索,实现可证明的次线性遗憾。为了评估我们的方法,我们构建了一个基于真实世界轨迹的大规模车联网模拟器,捕捉动态参与,RSU的干扰和通信的可变性。大量的实验表明,我们的方法在所有基线中实现了最佳的准确性-效率权衡,将延迟减少了24%以上,并将平均准确性提高了2.5%以上。
摘要 :Federated fine-tuning has emerged as a promising approach for adapting foundation models (FMs) to diverse downstream tasks in edge environments. In Internet of Vehicles (IoV) systems, enabling efficient and low-latency multi-task adaptation is particularly challenging due to client mobility, heterogeneous resources, and intermittent connectivity. This paper proposes a hierarchical federated fine-tuning framework that coordinates roadside units (RSUs) and vehicles to support resource-aware and mobility-resilient learning across dynamic IoV scenarios. Leveraging Low-Rank Adaptation (LoRA), we introduce a decentralized, energy-aware rank adaptation mechanism formulated as a constrained multi-armed bandit problem. A novel UCB-DUAL algorithm is developed to enable adaptive exploration under per-task energy budgets, achieving provable sublinear regret. To evaluate our method, we construct a large-scale IoV simulator based on real-world trajectories, capturing dynamic participation, RSU handoffs, and communication variability. Extensive experiments show that our approach achieves the best accuracy-efficiency trade-off among all baselines, reducing latency by over 24\% and improving average accuracy by more than 2.5\%.


【10】MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
标题:MiCo:面向边缘AI的端到端混合精度神经网络协同探索框架
链接:https://arxiv.org/abs/2508.09500

作者:ng, Yangdi Lyu
备注:9 pages, 6 figures, accepted by ICCAD'25
摘要:具有极低位宽数据的量化神经网络(QNN)已被证明在边缘设备上的高效存储和计算方面很有前途。为了进一步降低精度下降,同时提高加速比,逐层混合精度量化(MPQ)成为一种流行的解决方案。然而,现有的探索MPQ方案的算法在灵活性和效率方面受到限制。理解不同MPQ方案对训练后量化和量化感知训练结果的复杂影响是传统方法的挑战。此外,在现有的工作中缺少用于MPQ模型的优化和部署的端到端框架。   在本文中,我们提出了MiCo框架,这是一个针对边缘AI应用程序的整体MPQ探索和部署框架。该框架采用了一种新的优化算法来搜索最佳量化方案,具有最高的精度,同时满足延迟约束。针对不同的硬件目标构建硬件感知延迟模型,以实现快速探索。在探索之后,该框架可以从PyTorch MPQ模型直接部署到裸机C代码,从而以最小的准确性下降实现端到端的加速。
摘要:Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. To further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision quantization (MPQ) becomes a popular solution. However, existing algorithms for exploring MPQ schemes are limited in flexibility and efficiency. Comprehending the complex impacts of different MPQ schemes on post-training quantization and quantization-aware training results is a challenge for conventional methods. Furthermore, an end-to-end framework for the optimization and deployment of MPQ models is missing in existing work.   In this paper, we propose the MiCo framework, a holistic MPQ exploration and deployment framework for edge AI applications. The framework adopts a novel optimization algorithm to search for optimal quantization schemes with the highest accuracies while meeting latency constraints. Hardware-aware latency models are built for different hardware targets to enable fast explorations. After the exploration, the framework enables direct deployment from PyTorch MPQ models to bare-metal C codes, leading to end-to-end speedup with minimal accuracy drops.


【11】Implicit Hypergraph Neural Networks: A Stable Framework for Higher-Order Relational Learning with Provable Guarantees
标题:隐式超图神经网络:具有可证明保证的高级关系学习的稳定框架
链接:https://arxiv.org/abs/2508.09427

作者:, Guangyu Tang, Jiaojiao Jiang
摘要:许多现实世界的交互是基于组的,而不是成对的,例如有多个共同作者和用户共同参与项目的论文。超图神经网络在建模高阶关系方面表现出了很大的潜力,但它们对固定数量的显式消息传递层的依赖限制了长距离依赖捕获,并且随着深度的增加会使训练不稳定。在这项工作中,我们引入了隐式超图神经网络(IHGNN),它为超图带来了隐式平衡公式:而不是堆叠层,IHGNN将表示计算为非线性定点方程的解,从而在没有深度架构的情况下实现了超边的稳定和高效的全局传播。我们开发了一个适定的训练计划,证明收敛,分析过平滑条件和表达的模型,并推导出一个转换的推广超图的界限。我们进一步提出了一个隐式梯度训练过程以及基于投影的稳定策略。对引文基准的大量实验表明,IHGNN在准确性和鲁棒性方面始终优于强大的传统图/超图神经网络基线。从经验上讲,IHGNN对随机初始化和超参数变化具有弹性,突出了其强大的泛化能力和高阶关系学习的实用价值。
摘要:Many real-world interactions are group-based rather than pairwise such as papers with multiple co-authors and users jointly engaging with items. Hypergraph neural networks have shown great promise at modeling higher-order relations, but their reliance on a fixed number of explicit message-passing layers limits long-range dependency capture and can destabilize training as depth grows. In this work, we introduce Implicit Hypergraph Neural Networks (IHGNN), which bring the implicit equilibrium formulation to hypergraphs: instead of stacking layers, IHGNN computes representations as the solution to a nonlinear fixed-point equation, enabling stable and efficient global propagation across hyperedges without deep architectures. We develop a well-posed training scheme with provable convergence, analyze the oversmoothing conditions and expressivity of the model, and derive a transductive generalization bound on hypergraphs. We further present an implicit-gradient training procedure coupled with a projection-based stabilization strategy. Extensive experiments on citation benchmarks show that IHGNN consistently outperforms strong traditional graph/hypergraph neural network baselines in both accuracy and robustness. Empirically, IHGNN is resilient to random initialization and hyperparameter variation, highlighting its strong generalization and practical value for higher-order relational learning.


【12】Integrating Feature Attention and Temporal Modeling for Collaborative Financial Risk Assessment
标题:集成特征注意力和时间建模进行协同金融风险评估
链接:https://arxiv.org/abs/2508.09399

作者:Zhen Xu, Youzhu Liu, Kunyuan Ma, Yuxiu Lin, Mohan Jiang
摘要:本文讨论了跨机构金融风险分析中数据隐私和协作建模的挑战。它提出了一个基于联邦学习的风险评估框架。在不共享原始数据的情况下,该方法可以跨多个机构进行联合建模和风险识别。这是通过将一个功能注意机制和时间建模结构。具体而言,该模型采用分布式优化策略。每个金融机构训练一个本地子模型。模型参数在上传之前使用差分隐私和噪声注入进行保护。然后,中央服务器汇总这些参数以生成全局模型。该全球模型用于系统性风险识别。为了验证所提出的方法的有效性,进行了多次实验。这些评估沟通效率,模型准确性,系统性风险检测和跨市场推广。结果表明,该模型在所有评估指标上都优于传统的集中式方法和现有的联邦学习变体。它在敏感的金融环境中展示了强大的建模能力和实用价值。该方法增强了风险识别的范围和效率,同时保护了数据主权。它为智能金融风险分析提供了安全高效的解决方案。
摘要:This paper addresses the challenges of data privacy and collaborative modeling in cross-institution financial risk analysis. It proposes a risk assessment framework based on federated learning. Without sharing raw data, the method enables joint modeling and risk identification across multiple institutions. This is achieved by incorporating a feature attention mechanism and temporal modeling structure. Specifically, the model adopts a distributed optimization strategy. Each financial institution trains a local sub-model. The model parameters are protected using differential privacy and noise injection before being uploaded. A central server then aggregates these parameters to generate a global model. This global model is used for systemic risk identification. To validate the effectiveness of the proposed method, multiple experiments are conducted. These evaluate communication efficiency, model accuracy, systemic risk detection, and cross-market generalization. The results show that the proposed model outperforms both traditional centralized methods and existing federated learning variants across all evaluation metrics. It demonstrates strong modeling capabilities and practical value in sensitive financial environments. The method enhances the scope and efficiency of risk identification while preserving data sovereignty. It offers a secure and efficient solution for intelligent financial risk analysis.


【13】What Can We Learn from Inter-Annotator Variability in Skin Lesion Segmentation?
标题:我们可以从皮肤病变分割中注释者间的变异性中学到什么?
链接:https://arxiv.org/abs/2508.09381

作者:ishek, Jeremy Kawahara, Ghassan Hamarneh
备注:Medical Image Computing and Computer-Assisted Intervention (MICCAI) ISIC Skin Image Analysis Workshop (MICCAI ISIC) 2025; 12 pages, 4 tables, 3 figures
摘要:医学图像分割由于模糊的对象边界、注释者偏好、专业知识和工具等因素而表现出注释者内和注释者间的可变性。边界模糊的病变,例如,毛刺状或浸润性结节,或根据ABCD规则的不规则边界,特别容易出现不一致,并且通常与恶性肿瘤有关。在这项工作中,我们策划IMA++,最大的多注释器皮肤病变分割数据集,我们进行了深入的研究,由于注释器,恶性肿瘤,工具和技能因素的变化。我们发现使用Dice测量的注释者间一致性(IAA)与皮肤病变的恶性程度之间存在统计学显著(p<0.001)相关性。我们进一步表明,IAA可以准确地预测直接从皮肤镜图像,实现平均绝对误差为0.108。最后,我们通过利用IAA作为多任务学习目标中的“软”临床特征来利用这种关联,在多个模型架构以及IMA++和四个公共皮肤镜数据集之间的平均平衡准确性提高了4.2%。该代码可在https://github.com/sfu-mial/skin-IAV上获得。
摘要:Medical image segmentation exhibits intra- and inter-annotator variability due to ambiguous object boundaries, annotator preferences, expertise, and tools, among other factors. Lesions with ambiguous boundaries, e.g., spiculated or infiltrative nodules, or irregular borders per the ABCD rule, are particularly prone to disagreement and are often associated with malignancy. In this work, we curate IMA++, the largest multi-annotator skin lesion segmentation dataset, on which we conduct an in-depth study of variability due to annotator, malignancy, tool, and skill factors. We find a statistically significant (p<0.001) association between inter-annotator agreement (IAA), measured using Dice, and the malignancy of skin lesions. We further show that IAA can be accurately predicted directly from dermoscopic images, achieving a mean absolute error of 0.108. Finally, we leverage this association by utilizing IAA as a "soft" clinical feature within a multi-task learning objective, yielding a 4.2% improvement in balanced accuracy averaged across multiple model architectures and across IMA++ and four public dermoscopic datasets. The code is available at https://github.com/sfu-mial/skin-IAV.


【14】Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization
标题:突触修剪:深度学习规则化的生物启发
链接:https://arxiv.org/abs/2508.09330

作者:s, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi
备注:24 pages, 7 figures
摘要:生物大脑中的突触修剪去除了弱连接以提高效率。相比之下,人工神经网络中的dropout正则化随机地使神经元失活,而不考虑活动相关的修剪。我们提出了一种基于幅度的突触修剪方法,通过在训练过程中逐步删除低重要性连接来更好地反映生物学。直接集成到训练循环中作为dropout替换,我们的方法从各层的绝对幅度计算权重重要性,并应用立方时间表来逐渐增加全局稀疏性。在固定的时间间隔内,修剪掩码永久地移除低重要性权重,同时保持活跃权重的梯度流,从而无需单独的修剪和微调阶段。在四个数据集上对包括RNN、LSTM和Patch Time Series Transformer在内的多个时间序列预测模型进行的实验显示出一致的收益。我们的方法总体上排名最好,Friedman检验证实了统计学上的显著改善(p < 0.01)。在财务预测中,它将平均绝对误差降低了20%,超过了无或标准压降的模型,并在选定的Transformer模型中降低了52%。这种动态修剪机制通过将权重消除与渐进稀疏化相结合来推进正则化,从而轻松集成到不同的架构中。其强大的性能,特别是在金融时间序列预测,突出了其作为一个实用的替代传统的辍学技术的潜力。
摘要:Synaptic pruning in biological brains removes weak connections to improve efficiency. In contrast, dropout regularization in artificial neural networks randomly deactivates neurons without considering activity-dependent pruning. We propose a magnitude-based synaptic pruning method that better reflects biology by progressively removing low-importance connections during training. Integrated directly into the training loop as a dropout replacement, our approach computes weight importance from absolute magnitudes across layers and applies a cubic schedule to gradually increase global sparsity. At fixed intervals, pruning masks permanently remove low-importance weights while maintaining gradient flow for active ones, eliminating the need for separate pruning and fine-tuning phases. Experiments on multiple time series forecasting models including RNN, LSTM, and Patch Time Series Transformer across four datasets show consistent gains. Our method ranked best overall, with statistically significant improvements confirmed by Friedman tests (p < 0.01). In financial forecasting, it reduced Mean Absolute Error by up to 20% over models with no or standard dropout, and up to 52% in select transformer models. This dynamic pruning mechanism advances regularization by coupling weight elimination with progressive sparsification, offering easy integration into diverse architectures. Its strong performance, especially in financial time series forecasting, highlights its potential as a practical alternative to conventional dropout techniques.


【15】MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
标题:MoQE:通过混合量化专家来提高量化模型性能
链接:https://arxiv.org/abs/2508.09204

作者:ang, Yunquan Zhang, Boyang Zhang, Zeyu Liu, Daning Cheng
摘要:量化方法在提高模型效率和降低部署成本方面发挥着至关重要的作用,使深度学习模型能够在资源受限的设备上广泛应用。然而,量化过程不可避免地引入精度降级。在本文中,我们提出了混合量化专家(简称。MoQE)是一种基于混合专家(MoE)架构的量化推理框架,旨在共同提高量化模型的性能。MoQE将一个全精度模型的多个量化变量组合为专门的“量化专家”,并根据其特征将输入数据动态路由到最合适的专家。MoQE通过专门化量化专家模型来消除单一量化模型中常见的性能下降。我们为CV和NLP任务量身定制了轻量级的结构感知路由器模型。在包括ImageNet、WikiText、C4和OpenWebText在内的基准数据集上对ResNet、LLaMA和Qwen模型系列进行的实验评估表明,MoQE实现了与SOTA量化模型相当的性能,而不会显著增加推理延迟。
摘要:Quantization method plays a crucial role in improving model efficiency and reducing deployment costs, enabling the widespread application of deep learning models on resource-constrained devices. However, the quantization process inevitably introduces accuracy degradation. In this paper, we propose Mixture of Quantization Experts( abbr. MoQE), a quantization inference framework based on the Mixture-of-Experts (MoE) architecture, aiming to jointly improve the performance of quantization models. MoQE combines multiple quantization variants of one full-precision model as specialized "quantization experts" and dynamically routes input data to the most suitable expert based on its characteristics. MoQE alleviates the performance degradation commonly seen in single quantization models through specialization quantization expert models. We design lightweight, structure-aware router models tailored for both CV and NLP tasks. Experimental evaluations on ResNet, LLaMA, and Qwen model families across benchmark datasets including ImageNet, WikiText, C4, and OpenWebText demonstrate that MoQE achieves performance comparable to SOTA quantization model, without incurring significant increases in inference latency.


【16】Energy-Efficient Stochastic Computing (SC) Neural Networks for Internet of Things Devices With Layer-Wise Adjustable Sequence Length (ASL)
标题:用于物联网设备的节能随机计算(SC)神经网络,具有逐层可调序列长度(ATL)
链接:https://arxiv.org/abs/2508.09163

作者:ng, Pedro Reviriego, Farzad Niknia, Zhen Gao, Javier Conde, Shanshan Liu, Fabrizio Lombardi
备注:None
摘要:随机计算(SC)已经成为在资源有限的场景中部署神经网络(NN)的一种有效的低功耗替代方案,例如物联网(IoT)。通过将值编码为串行比特流,与传统浮点(FP)设计相比,SC显著降低了能耗;然而,SC的逐层混合精度实现的进一步改进仍未得到探索。本文介绍了可调序列长度(ASL),这是一种将混合精度概念专门应用于SC NN的新方案。通过引入一个基于算子范数的理论模型,本文表明截断噪声可以通过估计的放大因子在层中累积传播。一个扩展的灵敏度分析,使用随机森林(RF)回归评估多层截断效应,并验证理论预测与实际网络行为的对齐。为了适应不同的应用场景,本文提出了两种截断策略(粗粒度和细粒度),在每一层应用不同的序列长度配置。在32 nm处合成的流水线SC MLP上的评估表明,ASL可以将能量和延迟开销降低高达60%以上,而精度损失可以忽略不计。它证实了ASL方案在物联网应用中的可行性,并强调了混合精度截断在SC设计中的独特优势。
摘要 :Stochastic computing (SC) has emerged as an efficient low-power alternative for deploying neural networks (NNs) in resource-limited scenarios, such as the Internet of Things (IoT). By encoding values as serial bitstreams, SC significantly reduces energy dissipation compared to conventional floating-point (FP) designs; however, further improvement of layer-wise mixed-precision implementation for SC remains unexplored. This article introduces Adjustable Sequence Length (ASL), a novel scheme that applies mixed-precision concepts specifically to SC NNs. By introducing an operator-norm-based theoretical model, this article shows that truncation noise can cumulatively propagate through the layers by the estimated amplification factors. An extended sensitivity analysis is presented, using random forest (RF) regression to evaluate multilayer truncation effects and validate the alignment of theoretical predictions with practical network behaviors. To accommodate different application scenarios, this article proposes two truncation strategies (coarse-grained and fine-grained), which apply diverse sequence length configurations at each layer. Evaluations on a pipelined SC MLP synthesized at 32nm demonstrate that ASL can reduce energy and latency overheads by up to over 60% with negligible accuracy loss. It confirms the feasibility of the ASL scheme for IoT applications and highlights the distinct advantages of mixed-precision truncation in SC designs.


【17】Physics-Guided Memory Network for Building Energy Modeling
标题:用于建筑能源建模的物理引导记忆网络
链接:https://arxiv.org/abs/2508.09161

作者:Umair Danish, Kashif Ali, Kamran Siddiqui, Katarina Grolinger
备注:Published version. 12 pages, 6 figures. Open access under CC BY-NC-ND   4.0 license. Publisher: Elsevier. Journal: Energy and AI
摘要:准确的能源消耗预测对于建筑部门的有效资源管理和可持续性至关重要。深度学习模型非常成功,但与有限的历史数据作斗争,并且在历史数据不可用时变得不可用,例如在新建的建筑物中。另一方面,基于物理的模型,如EnergyPlus,在不依赖历史数据的情况下模拟能耗,但需要大量的建筑参数规范和相当长的时间来模拟建筑物。本文介绍了物理引导记忆网络(PgMN),这是一种神经网络,它集成了深度学习和基于物理的模型的预测,以解决它们的局限性。PgMN包括一个并行投影层来处理不完整的输入,一个记忆单元来解释持续的偏差,和一个记忆经验模块来优化扩展预测超出其输入范围并产生输出。理论评估表明,PgMN的组件是数学上有效的执行各自的任务。PgMN在每小时分辨率的短期能源预测方面进行了评估,这对于智能电网和智能建筑系统的运营决策至关重要。实验验证表明,PgMN的准确性和适用性,在不同的场景,如新建的建筑物,丢失的数据,稀疏的历史数据,和动态的基础设施的变化。本文为动态建筑环境中的能耗预测提供了一种很有前途的解决方案,增强了历史数据有限或不可用或基于物理的模型不足时的模型适用性。
摘要:Accurate energy consumption forecasting is essential for efficient resource management and sustainability in the building sector. Deep learning models are highly successful but struggle with limited historical data and become unusable when historical data are unavailable, such as in newly constructed buildings. On the other hand, physics-based models, such as EnergyPlus, simulate energy consumption without relying on historical data but require extensive building parameter specifications and considerable time to model a building. This paper introduces a Physics-Guided Memory Network (PgMN), a neural network that integrates predictions from deep learning and physics-based models to address their limitations. PgMN comprises a Parallel Projection Layers to process incomplete inputs, a Memory Unit to account for persistent biases, and a Memory Experience Module to optimally extend forecasts beyond their input range and produce output. Theoretical evaluation shows that components of PgMN are mathematically valid for performing their respective tasks. The PgMN was evaluated on short-term energy forecasting at an hourly resolution, critical for operational decision-making in smart grid and smart building systems. Experimental validation shows accuracy and applicability of PgMN in diverse scenarios such as newly constructed buildings, missing data, sparse historical data, and dynamic infrastructure changes. This paper provides a promising solution for energy consumption forecasting in dynamic building environments, enhancing model applicability in scenarios where historical data are limited or unavailable or when physics-based models are inadequate.


【18】Agentic TinyML for Intent-aware Handover in 6G Wireless Networks
标题:用于6G无线网络中意图感知切换的抽象TinyML
链接:https://arxiv.org/abs/2508.09147

作者:h, Roberto Morabito, Sasu Tarkoma, Anders Lindgren, Susanna Pirttikangas, Lauri Lovén
摘要:随着6G网络逐渐演变为人工智能驱动的、以用户为中心的生态系统,传统的反应式切换机制表现出了局限性,特别是在移动边缘计算和基于自主代理的服务场景中。本文介绍了WAAN,这是一个跨层框架,通过嵌入轻量级TinyML代理作为跨异构边缘节点的自治,具有协商能力的实体,有助于意图传播和网络适应,从而实现意图感知和主动转发。为了确保跨移动性引起的中断的连续性,WAAN采用了半稳定的会合点,作为协调锚上下文转移和状态保存。该框架的业务能力,通过多模式环境控制案例研究,突出其有效性,在移动性下保持用户体验。最后,本文讨论了与WAAN的部署和发展相关的关键挑战和未来机遇。
摘要:As 6G networks evolve into increasingly AI-driven, user-centric ecosystems, traditional reactive handover mechanisms demonstrate limitations, especially in mobile edge computing and autonomous agent-based service scenarios. This manuscript introduces WAAN, a cross-layer framework that enables intent-aware and proactive handovers by embedding lightweight TinyML agents as autonomous, negotiation-capable entities across heterogeneous edge nodes that contribute to intent propagation and network adaptation. To ensure continuity across mobility-induced disruptions, WAAN incorporates semi-stable rendezvous points that serve as coordination anchors for context transfer and state preservation. The framework's operational capabilities are demonstrated through a multimodal environmental control case study, highlighting its effectiveness in maintaining user experience under mobility. Finally, the article discusses key challenges and future opportunities associated with the deployment and evolution of WAAN.


【19】DeepWKB: Learning WKB Expansions of Invariant Distributions for Stochastic Systems
标题:DeepWKB:学习随机系统不变分布的WKB展开
链接:https://arxiv.org/abs/2508.09529

作者:icheng Liu, Shirou Wang
备注:29 pages, 7 figures
摘要:本文介绍了一种新的深度学习方法,称为DeepWKB,用于通过其WKB近似$u_\n(x)= Q估计随机扰动系统的不变分布(\n)^{-1} Z_\n(x)\exp\{-V(x)/\n\}$,其中V$被称为准势,$\n $表示噪声强度,而$Q(\n)$是归一化因子。DeepWKB方法利用Monte Carlo数据和$V$和$Z\N $所满足的偏微分方程,分别计算$V$和$Z\N $。这使得近似的不变分布的奇异制度,其中$\n $是足够小的,这仍然是一个重大的挑战,大多数现有的方法。此外,DeepWKB方法适用于高维随机系统,其确定性对应物允许非平凡吸引子。特别是,它提供了一个可扩展的和灵活的替代计算的准势,这在分析罕见事件,亚稳定性,和复杂系统的随机稳定性的关键作用。
摘要:This paper introduces a novel deep learning method, called DeepWKB, for estimating the invariant distribution of randomly perturbed systems via its Wentzel-Kramers-Brillouin (WKB) approximation $u_\epsilon(x) = Q(\epsilon)^{-1} Z_\epsilon(x) \exp\{-V(x)/\epsilon\}$, where $V$ is known as the quasi-potential, $\epsilon$ denotes the noise strength, and $Q(\epsilon)$ is the normalization factor. By utilizing both Monte Carlo data and the partial differential equations satisfied by $V$ and $Z_\epsilon$, the DeepWKB method computes $V$ and $Z_\epsilon$ separately. This enables an approximation of the invariant distribution in the singular regime where $\epsilon$ is sufficiently small, which remains a significant challenge for most existing methods. Moreover, the DeepWKB method is applicable to higher-dimensional stochastic systems whose deterministic counterparts admit non-trivial attractors. In particular, it provides a scalable and flexible alternative for computing the quasi-potential, which plays a key role in the analysis of rare events, metastability, and the stochastic stability of complex systems.


【20】ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs
标题:ProMode:一个以声学和文本输入为条件的语音韵律模型
链接:https://arxiv.org/abs/2508.09389

作者:, Qingju Liu, Hyeongwoo Kim, Pablo Garrido, Abeer Alwan
备注:Interspeech 2025; demo page at this https URL
摘要:韵律不仅传达了语音信号丰富的情感和语义信息,而且也传达了个体的个性特征。我们提出了一个独立的模型,将文本映射到韵律特征,如F0和能量,并可用于下游任务,如TTS。ProMode编码器将声学特征和时间对齐的文本内容作为输入,两者都被部分掩蔽,并获得固定长度的潜在韵律嵌入。解码器使用编码的韵律输入和未掩蔽的文本内容两者来预测掩蔽区域中的声学。在GigaSpeech数据集上训练,我们将我们的方法与最先进的风格编码器进行了比较。对于F0和能量预测,我们在不同的粒度级别上显示了我们的模型的一致改进。我们还将这些预测的韵律特征集成到TTS系统中,并进行感知测试,与基线相比,这些测试显示出更高的韵律偏好,表明该模型在韵律建模很重要的任务中具有潜力。
摘要:Prosody conveys rich emotional and semantic information of the speech signal as well as individual idiosyncrasies. We propose a stand-alone model that maps text-to-prosodic features such as F0 and energy and can be used in downstream tasks such as TTS. The ProMode encoder takes as input acoustic features and time-aligned textual content, both are partially masked, and obtains a fixed-length latent prosodic embedding. The decoder predicts acoustics in the masked region using both the encoded prosody input and unmasked textual content. Trained on the GigaSpeech dataset, we compare our method with state-of-the-art style encoders. For F0 and energy predictions, we show consistent improvements for our model at different levels of granularity. We also integrate these predicted prosodic features into a TTS system and conduct perceptual tests, which show higher prosody preference compared to the baselines, demonstrating the model's potential in tasks where prosody modeling is important.


【21】Classifying Cool Dwarfs: Comprehensive Spectral Typing of Field and Peculiar Dwarfs Using Machine Learning
标题:冷矮星分类:使用机器学习对田野和特殊矮星进行全面光谱分型
链接:https://arxiv.org/abs/2508.09370

作者:Zhou, Christopher A. Theissen, S. Jean Feeser, William M. J. Best, Adam J. Burgasser, Kelle L. Cruz, Lexu Zhao
备注:35 pages, 24 figures, 9 tables, accepted for publication in The Astrophysical Journal
摘要:低质量恒星和褐矮星-光谱类型(SpTs)M0和更晚-在研究恒星和亚恒星过程和人口统计方面发挥着重要作用,达到行星质量物体。目前,这些来源的分类仍然严重依赖于光谱特征的目视检查,等效宽度测量或窄/宽带光谱指数。机器学习(ML)方法的最新进展提供了光谱分类的自动化方法,随着Gaia,SDSS和SPHEREx等大型光谱调查生成包含数百万光谱的数据集,这种方法变得越来越重要。我们研究了ML在低分辨率(R $\sim $120)近红外光谱的M0-T9矮星的光谱类型分类的应用与SpeX仪器上的NASA红外望远镜设施。我们的具体目标是分类的重力和金属依赖的子类晚型矮星。我们使用分箱通量作为输入特征,并比较了使用随机森林(RF),支持向量机(SVM)和K-最近邻(KNN)模型构建的谱型估计器的功效。我们测试了不同标准化的影响,并分析了不同光谱区域对于表面重力和金属丰度亚类分类的相对重要性。我们表现最好的模型(使用KNN)将95.5 $\pm $0.6%的源分类到$\pm $1 SpT内,并以89.5 $\pm $0.9%的准确度分配表面重力和金属丰度子类。我们测试了信噪比对分类精度的依赖性,发现SNR $\gtrsim $60的源具有$\gtrsim $95%的准确性。我们还发现,ZY波段在RF模型中起着最突出的作用,FeH和TiO具有最高的特征重要性。
摘要:Low-mass stars and brown dwarfs -- spectral types (SpTs) M0 and later -- play a significant role in studying stellar and substellar processes and demographics, reaching down to planetary-mass objects. Currently, the classification of these sources remains heavily reliant on visual inspection of spectral features, equivalent width measurements, or narrow-/wide-band spectral indices. Recent advances in machine learning (ML) methods offer automated approaches for spectral typing, which are becoming increasingly important as large spectroscopic surveys such as Gaia, SDSS, and SPHEREx generate datasets containing millions of spectra. We investigate the application of ML in spectral type classification on low-resolution (R $\sim$ 120) near-infrared spectra of M0--T9 dwarfs obtained with the SpeX instrument on the NASA Infrared Telescope Facility. We specifically aim to classify the gravity- and metallicity-dependent subclasses for late-type dwarfs. We used binned fluxes as input features and compared the efficacy of spectral type estimators built using Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) models. We tested the influence of different normalizations and analyzed the relative importance of different spectral regions for surface gravity and metallicity subclass classification. Our best-performing model (using KNN) classifies 95.5 $\pm$ 0.6% of sources to within $\pm$1 SpT, and assigns surface gravity and metallicity subclasses with 89.5 $\pm$ 0.9% accuracy. We test the dependence of signal-to-noise ratio on classification accuracy and find sources with SNR $\gtrsim$ 60 have $\gtrsim$ 95% accuracy. We also find that zy-band plays the most prominent role in the RF model, with FeH and TiO having the highest feature importance.


【22】Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
标题:目标汤:语音处理的多语言多任务建模
链接:https://arxiv.org/abs/2508.09228

作者:f, Lisha Chen, Xiaodong Cui, Songtao Lu, Brian Kingsbury, Tianyi Chen
摘要:训练多语言、多任务语音处理(MSP)的单一模型受到语音识别和翻译等任务之间目标冲突的严重阻碍。虽然多目标优化(MOO)旨在对齐梯度更新,但其有效性随着任务数量的增加而降低,因此很难找到共同的下降方向。这就提出了一个根本性的问题:高度冲突的目标应该联合优化还是分成层次结构优化?为了解决这个问题,本文研究了三个多目标MSP配方,我们称之为\textbf{目标汤配方}。这些公式在不同的优化级别应用多目标优化,以减轻所有目标之间的潜在冲突。为了确保效率,我们引入了一个轻量级的层选择机制,只使用最有问题的层来计算避免冲突的梯度,从而最大限度地减少计算和内存开销。在CoVoST v2、LibriSpeech和AISHELL-1上进行的大量实验表明,将识别和翻译任务分开的双层配方始终优于标准的平面优化。我们的工作表明,分层MOO是一个更有效的和可扩展的方法,为建设国家的最先进的MSP模型。我们的代码已在https://github.com/afmsaif/Objective_Soups上发布。
摘要:Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks like speech recognition and translation. While multi-objective optimization (MOO) aims to align gradient updates, its effectiveness diminishes as the number of tasks grows, making it difficult to find a common descent direction. This raises a fundamental question: should highly conflicting objectives be optimized jointly or separated into a hierarchical structure? To address this question, this paper investigates three multi-objective MSP formulations, which we refer to as \textbf{objective soup recipes}. These formulations apply multi-objective optimization at different optimization levels to mitigate potential conflicts among all objectives. To ensure efficiency, we introduce a lightweight layer-selection mechanism that computes the conflict-avoiding gradient using only the most problematic layers, minimizing computational and memory overhead. Extensive experiments on CoVoST v2, LibriSpeech, and AISHELL-1 reveal that a bi-level recipe separating recognition and translation tasks consistently outperforms standard flat optimization. Our work demonstrates that hierarchical MOO is a more effective and scalable approach for building state-of-the-art MSP models. Our code has been released at https://github.com/afmsaif/Objective_Soups.


【23】Exploring Molecular Odor Taxonomies for Structure-based Odor Predictions using Machine Learning
标题:使用机器学习探索分子气味分类以进行基于结构的气味预测
链接:https://arxiv.org/abs/2508.09217

作者:jan, Stijn Sluis, Reza Haydarlou, Sanne Abeln, Pasquale Lisena, Raphael Troncy, Caro Verbeek, Inger Leemans, Halima Mouhib
备注:24 pages (58 pages including supporting information), 9 Figures, 4 Tables; additional Tables and Figures in the supporting information
摘要:从分子结构预测气味的关键挑战之一无疑是我们对气味空间和潜在结构-气味关系的复杂性的有限理解。在这里,我们表明,基于结构的气味预测的机器学习模型的预测性能可以使用专家和数据驱动的气味分类来提高。专家分类法是基于语义和感知的相似性,而数据驱动的分类法是基于直接从准备好的数据集的气味描述符的聚类共现模式。这两种分类法都改进了不同机器学习模型的预测,并且优于不反映气味描述符之间现有关系的描述符的随机分组。我们通过它们在不同气味类别中的预测性能来评估这两种分类法的质量,并进行深入的错误分析,突出气味-结构关系的复杂性,并通过展示香水中使用的梨气味剂来识别分类法中潜在的不一致性。数据驱动的分类法使我们能够批判性地评估我们的专家分类法,并更好地了解分子气味空间。这两个分类以及完整的数据集提供给社区,为未来社区驱动的气味分子基础的探索提供了一块垫脚石。此外,我们提供了一个详细的多层专家分类法,包括来自Pyrfume存储库的777个不同的描述符。
摘要:One of the key challenges to predict odor from molecular structure is unarguably our limited understanding of the odor space and the complexity of the underlying structure-odor relationships. Here, we show that the predictive performance of machine learning models for structure-based odor predictions can be improved using both, an expert and a data-driven odor taxonomy. The expert taxonomy is based on semantic and perceptual similarities, while the data-driven taxonomy is based on clustering co-occurrence patterns of odor descriptors directly from the prepared dataset. Both taxonomies improve the predictions of different machine learning models and outperform random groupings of descriptors that do not reflect existing relations between odor descriptors. We assess the quality of both taxonomies through their predictive performance across different odor classes and perform an in-depth error analysis highlighting the complexity of odor-structure relationships and identifying potential inconsistencies within the taxonomies by showcasing pear odorants used in perfumery. The data-driven taxonomy allows us to critically evaluate our expert taxonomy and better understand the molecular odor space. Both taxonomies as well as a full dataset are made available to the community, providing a stepping stone for a future community-driven exploration of the molecular basis of smell. In addition, we provide a detailed multi-layer expert taxonomy including a total of 777 different descriptors from the Pyrfume repository.


【24】Real-time deep learning phase imaging flow cytometer reveals blood cell aggregate biomarkers for haematology diagnostics
标题:实时深度学习相位成像流式细胞仪揭示血液学诊断的血细胞聚集生物标志物
链接:https://arxiv.org/abs/2508.09215

作者:ikoyun, Qianyu Chen, Liu Wei, Si Ko Myo, Johannes Krell, Martin Schlegel, Win Sen Kuan, John Tshon Yit Soong, Gerhard Schneider, Clarissa Prazeres da Costa, Percy A. Knolle, Laurent Renia, Matthew Edward Cove, Hwee Kuan Lee, Klaus Diepold, Oliver Hayden
摘要:虽然分析罕见的血细胞聚集体在自动血液学中仍然具有挑战性,但它们可以显着推进无标记功能诊断。传统的流式细胞仪有效地进行白细胞分类的细胞计数,但不能识别具有标记结果的聚集体,需要人工检查。定量相位成像流式细胞术捕获详细的聚集体形态,但临床使用受到大量数据存储和离线处理的阻碍。将隐藏的生物标志物添加到常规血液学检查中将显著改善诊断,而无需标记结果。我们提出了RT-HAD,这是一种用于离轴数字全息显微镜(DHM)的端到端基于深度学习的图像和数据处理框架,它结合了物理一致的全息重建和检测,在图形中表示每个血细胞以识别聚集体。RT-HAD实时处理超过30 GB的图像数据,周转时间<1.5分钟,血小板聚集体检测的错误率为8.9%,与血液学生物标志物的可接受实验室错误率相匹配,并解决了即时诊断的大数据挑战。
摘要:While analysing rare blood cell aggregates remains challenging in automated haematology, they could markedly advance label-free functional diagnostics. Conventional flow cytometers efficiently perform cell counting with leukocyte differentials but fail to identify aggregates with flagged results, requiring manual reviews. Quantitative phase imaging flow cytometry captures detailed aggregate morphologies, but clinical use is hampered by massive data storage and offline processing. Incorporating hidden biomarkers into routine haematology panels would significantly improve diagnostics without flagged results. We present RT-HAD, an end-to-end deep learning-based image and data processing framework for off-axis digital holographic microscopy (DHM), which combines physics-consistent holographic reconstruction and detection, representing each blood cell in a graph to recognize aggregates. RT-HAD processes >30 GB of image data on-the-fly with turnaround time of <1.5 min and error rate of 8.9% in platelet aggregate detection, which matches acceptable laboratory error rates of haematology biomarkers and solves the big data challenge for point-of-care diagnostics.


【25】Deep Generative Models for Discrete Genotype Simulation
标题:用于离散基因型模拟的深生成模型
链接:https://arxiv.org/abs/2508.09212

作者: (GABI), Thierry Tribout (GABI), Didier Boichard (GABI), Blaise Hanczar (IBISC), Julien Chiquet (MIA Paris-Saclay), Eric Barrey (GABI)
摘要:深度生成模型为模拟真实的基因组数据开辟了新的途径,同时保护隐私并解决数据可访问性限制。虽然以前的研究主要集中在生成基因表达或单体型数据,但本研究探索了在无条件和表型条件设置中生成基因型数据,由于基因型数据的离散性,这本质上更具挑战性。在这项工作中,我们开发和评估了常用的生成模型,包括变分自动编码器(VAE)、扩散模型和生成对抗网络(GANs),并提出了针对离散基因型数据的自适应方法。我们在大规模数据集上进行了广泛的实验,包括来自奶牛的所有染色体和来自人类的多条染色体。使用从深度学习和定量遗传学文献中提取的一组完善的指标来评估模型性能。我们的研究结果表明,这些模型可以有效地捕捉遗传模式和保存基因型-表型关联。我们的研究结果提供了一个全面的比较这些模型,并为未来的基因型模拟研究提供了实用的指导方针。我们已在https://github.com/SihanXXX/DiscreteGenoGen上公开了我们的代码。
摘要:Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data. In this work, we developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep learning and quantitative genetics literature. Our results show that these models can effectively capture genetic patterns and preserve genotype-phenotype association. Our findings provide a comprehensive comparison of these models and offer practical guidelines for future research in genotype simulation. We have made our code publicly available at https://github.com/SihanXXX/DiscreteGenoGen.


其他(22篇)

【1】GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation
标题:GBC:全身类人模仿的广义行为克隆框架
链接:https://arxiv.org/abs/2508.09960

作者:, Chengyuan Luo, Jiaheng Du, Wentao He, Jun-Guo Lu
摘要:创建类人人形机器人受到一个基本碎片的阻碍:数据处理和学习算法很少在不同的机器人形态中通用。本文介绍了广义行为克隆(GBC)框架,这是一个旨在解决这一端到端挑战的全面统一的解决方案。GBC通过三项协同创新,建立了从人体运动到机器人动作的完整路径。首先,自适应数据管道利用可区分的IK网络将任何人类MoCap数据自动重定向到任何人形机器人。在此基础上,我们的新型Dagger-MMPPO算法及其MMTransformer架构可以学习强大的高保真模仿策略。为了完成生态系统,整个框架作为一个基于Isaac Lab的高效开源平台交付,使社区能够通过简单的配置脚本部署完整的工作流程。我们通过在多个异构类人机器人上训练策略来验证GBC的强大性和通用性,展示了出色的性能并转移到新的运动。这项工作建立了第一个实用和统一的途径,创造真正的广义人形控制器。
摘要 :The creation of human-like humanoid robots is hindered by a fundamental fragmentation: data processing and learning algorithms are rarely universal across different robot morphologies. This paper introduces the Generalized Behavior Cloning (GBC) framework, a comprehensive and unified solution designed to solve this end-to-end challenge. GBC establishes a complete pathway from human motion to robot action through three synergistic innovations. First, an adaptive data pipeline leverages a differentiable IK network to automatically retarget any human MoCap data to any humanoid. Building on this foundation, our novel DAgger-MMPPO algorithm with its MMTransformer architecture learns robust, high-fidelity imitation policies. To complete the ecosystem, the entire framework is delivered as an efficient, open-source platform based on Isaac Lab, empowering the community to deploy the full workflow via simple configuration scripts. We validate the power and generality of GBC by training policies on multiple heterogeneous humanoids, demonstrating excellent performance and transfer to novel motions. This work establishes the first practical and unified pathway for creating truly generalized humanoid controllers.


【2】Prototype-Guided Diffusion: Visual Conditioning without External Memory
标题:原型引导扩散:没有外部记忆的视觉条件反射
链接:https://arxiv.org/abs/2508.09922

作者:e, Hanane Azzag, Mustapha Lebbah
摘要:扩散模型已成为高质量图像生成的领先框架,在不同领域提供稳定的训练和强大的性能。然而,它们仍然是计算密集型的,特别是在迭代去噪过程中。像Stable Diffusion这样的潜在空间模型通过压缩表示来减轻部分成本,尽管代价是细粒度的细节。最近的方法,如检索增强扩散模型(RDM)解决效率的条件去噪从大型外部存储器银行检索类似的例子。虽然有效,但这些方法存在缺点:它们需要昂贵的存储和检索基础设施,依赖于CLIP等静态视觉语言模型的相似性,并且在训练过程中缺乏适应性。我们提出了原型扩散模型(PDM),一种方法,将原型学习直接集成到扩散过程中,以实现高效和自适应的视觉调节-无需外部记忆。PDM不是检索参考样本,而是使用对比学习从干净的图像特征构建一组动态的紧凑的视觉原型。这些原型通过将噪声表示与语义相关的视觉模式对齐来指导去噪步骤,从而实现具有强大语义基础的高效生成。实验表明,PDM保持高的生成质量,同时减少计算和存储开销,提供了一个可扩展的替代检索为基础的条件扩散模型。
摘要:Diffusion models have emerged as a leading framework for high-quality image generation, offering stable training and strong performance across diverse domains. However, they remain computationally intensive, particularly during the iterative denoising process. Latent-space models like Stable Diffusion alleviate some of this cost by operating in compressed representations, though at the expense of fine-grained detail. More recent approaches such as Retrieval-Augmented Diffusion Models (RDM) address efficiency by conditioning denoising on similar examples retrieved from large external memory banks. While effective, these methods introduce drawbacks: they require costly storage and retrieval infrastructure, depend on static vision-language models like CLIP for similarity, and lack adaptability during training. We propose the Prototype Diffusion Model (PDM), a method that integrates prototype learning directly into the diffusion process for efficient and adaptive visual conditioning - without external memory. Instead of retrieving reference samples, PDM constructs a dynamic set of compact visual prototypes from clean image features using contrastive learning. These prototypes guide the denoising steps by aligning noisy representations with semantically relevant visual patterns, enabling efficient generation with strong semantic grounding. Experiments show that PDM maintains high generation quality while reducing computational and storage overhead, offering a scalable alternative to retrieval-based conditioning in diffusion models.


【3】Rare anomalies require large datasets: About proving the existence of anomalies
标题:罕见异常需要大型数据集:关于证明异常的存在
链接:https://arxiv.org/abs/2508.09894

作者:ttermann, Emmanuel Müller
备注:13 pages, 8 figures
摘要:检测数据集内是否存在任何异常对于有效的异常检测至关重要,但它在异常检测文献中仍然令人惊讶地探索不足。本文提出了一个全面的研究,解决了根本问题:我们什么时候可以最终确定异常的存在?通过广泛的实验,涉及超过300万的各种异常检测任务和算法的统计测试,我们确定了数据集大小,污染率和算法相关常数$ \alpha_{\text{algo}} $之间的关系。我们的结果表明,对于大小为N $和污染率为$ \nu $的未标记数据集,条件$ N \ge \frac{\alpha_{\text{algo}{\nu^2} $表示确认异常存在所需的样本数量的下限。这个阈值意味着在证明异常的存在变得不可行之前,异常的罕见程度是有限的。
摘要:Detecting whether any anomalies exist within a dataset is crucial for effective anomaly detection, yet it remains surprisingly underexplored in anomaly detection literature. This paper presents a comprehensive study that addresses the fundamental question: When can we conclusively determine that anomalies are present? Through extensive experimentation involving over three million statistical tests across various anomaly detection tasks and algorithms, we identify a relationship between the dataset size, contamination rate, and an algorithm-dependent constant $ \alpha_{\text{algo}} $. Our results demonstrate that, for an unlabeled dataset of size $ N $ and contamination rate $ \nu $, the condition $ N \ge \frac{\alpha_{\text{algo}}}{\nu^2} $ represents a lower bound on the number of samples required to confirm anomaly existence. This threshold implies a limit to how rare anomalies can be before proving their existence becomes infeasible.


【4】FedShard: Federated Unlearning with Efficiency Fairness and Performance Fairness
标题:FedShard:实现效率公平和绩效公平的联合放弃学习
链接:https://arxiv.org/abs/2508.09866

作者:n, Meng Zhang, Yang Yang, Ningning Ding
摘要:为了保护客户在联邦学习中被遗忘的权利,联邦遗忘旨在从全局学习模型中删除离开客户的数据贡献。虽然目前的研究主要集中在提高忘却效率和有效性,效率的公平性和绩效公平性在去中心化的客户在忘却过程中的关键方面仍然在很大程度上未被探索。在这项研究中,我们介绍了FedShard,第一个联邦unlearning算法,旨在同时保证效率公平性和性能公平性。FedShard自适应地解决了收敛、遗忘效率和遗忘公平性之间的困境所带来的挑战。此外,我们提出了两个新的指标来定量评估unlearning算法的公平性,我们证明,以满足其他现有的公平性测量的知名属性。我们的理论分析和数值评估验证FedShard的公平性在两个unlearning性能和效率。我们证明了FedShard减轻了不公平的风险,如级联离开和中毒攻击,并实现了客户端之间更平衡的学习成本。实验结果表明,FedShard加速数据学习过程的速度比从头开始重新训练快1.3-6.2倍,比最先进的精确学习方法快4.9倍。
摘要:To protect clients' right to be forgotten in federated learning, federated unlearning aims to remove the data contribution of leaving clients from the global learned model. While current studies mainly focused on enhancing unlearning efficiency and effectiveness, the crucial aspects of efficiency fairness and performance fairness among decentralized clients during unlearning have remained largely unexplored. In this study, we introduce FedShard, the first federated unlearning algorithm designed to concurrently guarantee both efficiency fairness and performance fairness. FedShard adaptively addresses the challenges introduced by dilemmas among convergence, unlearning efficiency, and unlearning fairness. Furthermore, we propose two novel metrics to quantitatively assess the fairness of unlearning algorithms, which we prove to satisfy well-known properties in other existing fairness measurements. Our theoretical analysis and numerical evaluation validate FedShard's fairness in terms of both unlearning performance and efficiency. We demonstrate that FedShard mitigates unfairness risks such as cascaded leaving and poisoning attacks and realizes more balanced unlearning costs among clients. Experimental results indicate that FedShard accelerates the data unlearning process 1.3-6.2 times faster than retraining from scratch and 4.9 times faster than the state-of-the-art exact unlearning methods.


【5】Provable In-Context Vector Arithmetic via Retrieving Task Concepts
标题:通过检索任务概念的可证明的上下文内载体算法
链接:https://arxiv.org/abs/2508.09820

作者:Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, Taiji Suzuki
备注:Accepted by the 42nd International Conference on Machine Learning (ICML 2025)
摘要:情境学习(ICL)因其从演示中掌握功能/任务的能力而受到广泛关注。最近的研究表明,在ICL LLM中存在潜在的任务/功能向量。Merullo等人(2024)表明,LLM利用这个向量以及类似Word 2 Vec的向量算术的剩余流,解决事实回忆ICL任务。此外,最近的工作经验突出了关键作用的回答数据在提高事实回忆能力。尽管有这些见解,理论上的解释仍然难以捉摸。为了向前迈进一步,我们提出了一个基于经验的层次概念建模的理论框架。我们开发了一个优化理论,展示了通过交叉熵损失梯度下降训练的非线性残差Transformers如何通过向量运算执行事实召回ICL任务。我们证明了0-1损失收敛性,并显示了强大的推广,包括概念重组和分布变化的鲁棒性。这些结果阐明了Transformers器静态嵌入前辈的优势。实证模拟证实了我们的理论见解。
摘要:In-context learning (ICL) has garnered significant attention for its ability to grasp functions/tasks from demonstrations. Recent studies suggest the presence of a latent task/function vector in LLMs during ICL. Merullo et al. (2024) showed that LLMs leverage this vector alongside the residual stream for Word2Vec-like vector arithmetic, solving factual-recall ICL tasks. Additionally, recent work empirically highlighted the key role of Question-Answer data in enhancing factual-recall capabilities. Despite these insights, a theoretical explanation remains elusive. To move one step forward, we propose a theoretical framework building on empirically grounded hierarchical concept modeling. We develop an optimization theory, showing how nonlinear residual transformers trained via gradient descent on cross-entropy loss perform factual-recall ICL tasks via vector arithmetic. We prove 0-1 loss convergence and show the strong generalization, including robustness to concept recombination and distribution shifts. These results elucidate the advantages of transformers over static embedding predecessors. Empirical simulations corroborate our theoretical insights.


【6】Bayesian autoregression to optimize temporal Matérn kernel Gaussian process hyperparameters
标题:贝叶斯自回归优化时间Matérn核高斯过程超参数
链接:https://arxiv.org/abs/2508.09792

作者: Kouw
备注:9 pages, 4 figures, accepted to the International Conference on Probabilistic Numerics 2025
摘要:高斯过程是概率数值领域中的重要模型。我们提出了一个优化Mat'ern核时间高斯过程相对于核协方差函数的超参数的过程。它是基于铸造的自回归模型的参数作为一个递归贝叶斯估计过程的优化问题。我们表明,所提出的程序优于最大限度地提高边际似然以及汉密尔顿蒙特卡罗采样,无论是在运行时间和最终的均方根误差在高斯过程回归。
摘要:Gaussian processes are important models in the field of probabilistic numerics. We present a procedure for optimizing Mat\'ern kernel temporal Gaussian processes with respect to the kernel covariance function's hyperparameters. It is based on casting the optimization problem as a recursive Bayesian estimation procedure for the parameters of an autoregressive model. We demonstrate that the proposed procedure outperforms maximizing the marginal likelihood as well as Hamiltonian Monte Carlo sampling, both in terms of runtime and ultimate root mean square error in Gaussian process regression.


【7】$μ$-Parametrization for Mixture of Experts
标题:$μ$-混合专家的参数化
链接:https://arxiv.org/abs/2508.09752

作者:nicki, Kamil Ciebiera, Mateusz Boruń, Maciej Pióro, Jan Ludziejewski, Maciej Stefaniak, Michał Krutul, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jakub Krajewski
摘要:近年来,人们对LLM的兴趣和采用越来越多,$\mu$Transfer成为在大规模训练中调整超参数的关键技术。与此同时,专家混合(MoE)已经成为超大型模型中的领先架构。然而,这两个进步的交叉点尚未探索。在这项工作中,我们推导出一个$\mu$-参数化($\mu$P)的MoE,提供理论上的保证跨模型宽度的路由器和专家的功能学习。我们经验验证我们的参数化,并进一步研究如何缩放专家的数量和粒度影响最佳的学习率。
摘要:Recent years have seen a growing interest and adoption of LLMs, with $\mu$Transfer becoming a key technique for tuning hyperparameters in large-scale training. Meanwhile, Mixture-of-Experts (MoE) has emerged as a leading architecture in extremely large models. However, the intersection of these two advancements has remained unexplored. In this work, we derive a $\mu$-Parameterization ($\mu$P) for MoE, providing theoretical guarantees for feature learning across model widths in both the router and experts. We empirically validate our parameterization and further investigate how scaling the number of experts and granularity affects the optimal learning rate.


【8】Combating Noisy Labels via Dynamic Connection Masking
标题:通过动态连接屏蔽对抗噪声标签
链接:https://arxiv.org/abs/2508.09697

作者:ang, Fan Liu, Chuanyi Zhang, Fan Cheng, Yuhui Zheng
摘要:嘈杂的标签在现实世界中是不可避免的。由于深度神经网络记忆损坏标签的能力很强,这些嘈杂的标签可能会导致显着的性能下降。现有的研究主要集中在鲁棒损失函数和样本选择上,对模型结构中的正则化的探索相对有限。受Kolmogorov-Arnold网络(KAN)中稀疏正则化的启发,我们提出了一种用于多层感知器网络(MLP)和KAN的动态连接掩蔽(DCM)机制,以增强分类器对噪声标签的鲁棒性。该机制可以自适应地掩盖不重要的边缘在训练过程中,通过评估他们的信息承载能力。通过理论分析,证明了该算法在减小梯度误差方面的有效性。我们的方法可以无缝集成到各种噪声鲁棒训练方法中,以构建更鲁棒的深度网络,包括鲁棒的损失函数、样本选择策略和正则化技术。在合成和真实世界的基准测试上进行的大量实验表明,我们的方法始终优于最先进的(SOTA)方法。此外,我们也是第一个调查KAN作为分类器对噪声标签,揭示其优越的噪声鲁棒性在现实世界中的嘈杂的情况下MLP。我们的代码将很快公开。
摘要:Noisy labels are inevitable in real-world scenarios. Due to the strong capacity of deep neural networks to memorize corrupted labels, these noisy labels can cause significant performance degradation. Existing research on mitigating the negative effects of noisy labels has mainly focused on robust loss functions and sample selection, with comparatively limited exploration of regularization in model architecture. Inspired by the sparsity regularization used in Kolmogorov-Arnold Networks (KANs), we propose a Dynamic Connection Masking (DCM) mechanism for both Multi-Layer Perceptron Networks (MLPs) and KANs to enhance the robustness of classifiers against noisy labels. The mechanism can adaptively mask less important edges during training by evaluating their information-carrying capacity. Through theoretical analysis, we demonstrate its efficiency in reducing gradient error. Our approach can be seamlessly integrated into various noise-robust training methods to build more robust deep networks, including robust loss functions, sample selection strategies, and regularization techniques. Extensive experiments on both synthetic and real-world benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches. Furthermore, we are also the first to investigate KANs as classifiers against noisy labels, revealing their superior noise robustness over MLPs in real-world noisy scenarios. Our code will soon be publicly available.


【9】DeputyDev -- AI Powered Developer Assistant: Breaking the Code Review Logjam through Contextual AI to Boost Developer Productivity
标题:DeputyDev --人工智能驱动的开发人员助理:通过上下文人工智能打破代码审查僵局,提高开发人员的生产力
链接:https://arxiv.org/abs/2508.09676

作者:are, Vijay Saini, Deepak Sharma, Anand Kumar, Ankit Rana, Anshul Yadav
备注:12 pages, 5 figures, 6 pages of supplementary materials
摘要:这项研究调查了DeputyDev的实施和有效性,DeputyDev是一种人工智能代码审查助手,旨在解决软件开发过程中的效率低下问题。代码审查的过程是非常低效的,有几个原因,比如它是一个耗时的过程,不一致的反馈,以及审查质量在大多数时间都不符合标准。使用我们的遥测数据,我们观察到,在TATA 1 mg时,拉取请求(PR)处理效率显著低下,平均提取和审查时间分别为73和82小时,导致关闭周期为6.2天。审查周期的特点是审查方和提交方之间长时间反复沟通。加州大学欧文分校的研究表明,中断会导致平均23分钟的注意力分散,严重影响代码质量和及时交付。为了应对这些挑战,我们开发了DeputyDev的PR审查功能,提供自动化的上下文代码审查。我们进行了一个严格的双控制A/B实验,涉及200多名工程师,以评估DeputyDev对审查时间的影响。结果表明,平均每PR(23.09%)和平均每行代码(40.13%)的评审持续时间均在统计学上显著缩短。在实施了排除离群值的保障措施后,DeputyDev已在整个组织中有效推广。此外,它还作为软件即服务(SaaS)解决方案提供给外部公司,目前支持众多工程专业人员的日常工作。本研究探讨了人工智能辅助的代码审查在改善开发工作流程时间表和代码方面的实施和有效性。
摘要:This study investigates the implementation and efficacy of DeputyDev, an AI-powered code review assistant developed to address inefficiencies in the software development process. The process of code review is highly inefficient for several reasons, such as it being a time-consuming process, inconsistent feedback, and review quality not being at par most of the time. Using our telemetry data, we observed that at TATA 1mg, pull request (PR) processing exhibits significant inefficiencies, with average pick-up and review times of 73 and 82 hours, respectively, resulting in a 6.2 day closure cycle. The review cycle was marked by prolonged iterative communication between the reviewing and submitting parties. Research from the University of California, Irvine indicates that interruptions can lead to an average of 23 minutes of lost focus, critically affecting code quality and timely delivery. To address these challenges, we developed DeputyDev's PR review capabilities by providing automated, contextual code reviews. We conducted a rigorous double-controlled A/B experiment involving over 200 engineers to evaluate DeputyDev's impact on review times. The results demonstrated a statistically significant reduction in both average per PR (23.09%) and average per-line-of-code (40.13%) review durations. After implementing safeguards to exclude outliers, DeputyDev has been effectively rolled out across the entire organisation. Additionally, it has been made available to external companies as a Software-as-a-Service (SaaS) solution, currently supporting the daily work of numerous engineering professionals. This study explores the implementation and effectiveness of AI-assisted code reviews in improving development workflow timelines and code.


【10】HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
标题:HierMoE:通过分层代币分配和专家交换加速MoE训练
链接:https://arxiv.org/abs/2508.09591

作者:Lin, Xinglin Pan, Lin Zhang, Shaohuai Shi, Xuan Wang, Xiaowen Chu
摘要:稀疏激活的专家混合(MoE)转换器Transformer由于其稀疏性而成为大型语言模型(LLM)的常见架构,其需要较少的计算需求,同时易于缩放模型大小。在MoE模型中,每个MoE层需要动态地选择令牌以激活特定专家进行计算,而激活的专家可能不位于与令牌相同的设备或GPU中。然而,这导致所有GPU之间的大量通信和负载不平衡,这阻碍了GPU集群内分布式系统的可扩展性。为此,我们引入了HierMoE,通过两种拓扑感知技术来加速MoE模型的训练:1)令牌去重以减少通信流量,以及2)专家交换以平衡所有GPU之间的工作负载。为了使上述两种方法更具有一般性,我们建立了理论模型,旨在实现不同的模型配置和硬件环境下的最佳令牌复制和专家交换策略。我们在Megatron-LM上实现了我们的原型HierMoE系统,并在32-GPU集群上使用DeepSeek-V3和Qwen 3 - 30 B-A3 B模型进行了实验。实验结果表明,我们的HierMoE实现了1.55\times $到3.32\times $更快的通信和提供1.18\times $到1.27\times $更快的端到端的培训相比,最先进的MoE培训系统,Tutel-2DH,智能MoE,和Megatron-LM。
摘要:The sparsely activated mixture-of-experts (MoE) transformer has become a common architecture for large language models (LLMs) due to its sparsity, which requires fewer computational demands while easily scaling the model size. In MoE models, each MoE layer requires to dynamically choose tokens to activate particular experts for computation while the activated experts may not be located in the same device or GPU as the token. However, this leads to substantial communication and load imbalances across all GPUs, which obstructs the scalability of distributed systems within a GPU cluster. To this end, we introduce HierMoE to accelerate the training of MoE models by two topology-aware techniques: 1) token deduplication to reduce the communication traffic, and 2) expert swap to balance the workloads among all GPUs. To enable the above two proposed approaches to be more general, we build theoretical models aimed at achieving the best token duplication and expert swap strategy under different model configurations and hardware environments. We implement our prototype HierMoE system atop Megatron-LM and conduct experiments on a 32-GPU cluster with DeepSeek-V3 and Qwen3-30B-A3B models. Experimental results show that our HierMoE achieves $1.55\times$ to $3.32\times$ faster communication and delivers $1.18\times$ to $1.27\times$ faster end-to-end training compared to state-of-the-art MoE training systems, Tutel-2DH, SmartMoE, and Megatron-LM.


【11】Emergence of Hierarchies in Multi-Agent Self-Organizing Systems Pursuing a Joint Objective
标题:追求联合目标的多智能体自组织系统中分层结构的出现
链接:https://arxiv.org/abs/2508.09541

作者:, Guoxin Wang, Anton van Beek, Zhenjun Ming, Yan Yan
备注:34 pages,17 figures
摘要:多智能体自组织系统(MASOS)具有可扩展性、适应性、灵活性和鲁棒性等重要特性,这些特性使其在各个领域得到了广泛的应用。然而,MASOS的自组织性质也在其涌现行为中引入了不可预测性。本文重点研究了任务执行过程中依赖层次结构的出现,旨在了解这种层次结构是如何从Agent对联合目标的集体追求中产生的,它们是如何动态演变的,以及哪些因素支配着它们的发展。为了研究这种现象,多智能体强化学习(MARL)训练MASOS的协作推盒任务。通过计算每个代理的动作相对于其他代理的状态的梯度,代理间的依赖关系被量化,并通过这些依赖关系的聚合分析层次结构的出现。我们的研究结果表明,层次结构出现动态代理工作的联合目标,这些层次结构不断变化的任务要求。值得注意的是,这些依赖层次结构是响应于共享目标而有机地出现的,而不是预先配置的规则或参数的结果,这些规则或参数可以进行微调以实现特定的结果。此外,层次结构的出现受到任务环境和网络初始化条件的影响。此外,MASOS中的层次结构是由智能体在“环境”中的“天赋”和“努力”之间的动态相互作用产生的。“天赋”决定了代理人对集体决策的最初影响,而“环境”中的持续“努力”使代理人能够在系统中转换角色和位置。
摘要:Multi-agent self-organizing systems (MASOS) exhibit key characteristics including scalability, adaptability, flexibility, and robustness, which have contributed to their extensive application across various fields. However, the self-organizing nature of MASOS also introduces elements of unpredictability in their emergent behaviors. This paper focuses on the emergence of dependency hierarchies during task execution, aiming to understand how such hierarchies arise from agents' collective pursuit of the joint objective, how they evolve dynamically, and what factors govern their development. To investigate this phenomenon, multi-agent reinforcement learning (MARL) is employed to train MASOS for a collaborative box-pushing task. By calculating the gradients of each agent's actions in relation to the states of other agents, the inter-agent dependencies are quantified, and the emergence of hierarchies is analyzed through the aggregation of these dependencies. Our results demonstrate that hierarchies emerge dynamically as agents work towards a joint objective, with these hierarchies evolving in response to changing task requirements. Notably, these dependency hierarchies emerge organically in response to the shared objective, rather than being a consequence of pre-configured rules or parameters that can be fine-tuned to achieve specific results. Furthermore, the emergence of hierarchies is influenced by the task environment and network initialization conditions. Additionally, hierarchies in MASOS emerge from the dynamic interplay between agents' "Talent" and "Effort" within the "Environment." "Talent" determines an agent's initial influence on collective decision-making, while continuous "Effort" within the "Environment" enables agents to shift their roles and positions within the system.


【12】CWFBind: Geometry-Awareness for Fast and Accurate Protein-Ligand Docking
标题:CWFBind:快速准确的蛋白质配体对接的几何意识
链接:https://arxiv.org/abs/2508.09499

作者:, Chuan-Xian Ren, Hong Yan
摘要:准确预测小分子配体与蛋白质靶点的结合构象是合理药物设计的关键步骤。尽管最近基于深度学习的对接在速度和准确性上超过了传统方法,但许多方法依赖于图形表示和语言模型启发的编码器,而忽略了关键的几何信息,导致口袋定位不准确和不切实际的结合构象。在这项研究中,我们介绍CWFBind,加权,快速,准确的对接方法的基础上局部曲率功能。具体来说,我们在特征提取阶段集成局部曲率描述符,以丰富蛋白质和配体的几何表示,补充现有的化学,序列和结构特征。此外,我们将度感知加权机制嵌入到消息传递过程中,增强了模型捕获空间结构差异和交互强度的能力。为了解决口袋预测中的类别不平衡挑战,CWFBind采用了配体感知的动态半径策略以及增强的损失函数,有助于更精确地识别结合区域和关键残基。全面的实验评估表明,CWFBind在多个对接基准中实现了具有竞争力的性能,在准确性和效率之间实现了平衡。
摘要:Accurately predicting the binding conformation of small-molecule ligands to protein targets is a critical step in rational drug design. Although recent deep learning-based docking surpasses traditional methods in speed and accuracy, many approaches rely on graph representations and language model-inspired encoders while neglecting critical geometric information, resulting in inaccurate pocket localization and unrealistic binding conformations. In this study, we introduce CWFBind, a weighted, fast, and accurate docking method based on local curvature features. Specifically, we integrate local curvature descriptors during the feature extraction phase to enrich the geometric representation of both proteins and ligands, complementing existing chemical, sequence, and structural features. Furthermore, we embed degree-aware weighting mechanisms into the message passing process, enhancing the model's ability to capture spatial structural distinctions and interaction strengths. To address the class imbalance challenge in pocket prediction, CWFBind employs a ligand-aware dynamic radius strategy alongside an enhanced loss function, facilitating more precise identification of binding regions and key residues. Comprehensive experimental evaluations demonstrate that CWFBind achieves competitive performance across multiple docking benchmarks, offering a balanced trade-off between accuracy and efficiency.


【13】Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
标题:复活鲑鱼:用特定领域的稀疏自动编码器重新思考机械可解释性
链接:https://arxiv.org/abs/2508.09363

作者:'Neill, Mudith Jayasekara, Max Kirkby
摘要:稀疏自动编码器(SAE)将大型语言模型(LLM)激活分解为揭示机械结构的潜在特征。传统SAE在广泛的数据分布上进行训练,迫使固定的潜在预算仅捕获高频的通用模式。这通常会导致重建误差中出现显著的线性“暗物质”,并产生碎片或相互吸收的潜在物质,使解释复杂化。我们表明,将SAE训练限制在一个定义明确的领域(医学文本)将容量重新分配给特定领域的特征,从而提高重建保真度和可解释性。使用195 k个临床QA示例在Gemma-2模型的第20层激活上训练JumpReLU SAE,我们发现与宽域SAE相比,域限制SAE解释了高达20%的方差,实现了更高的损失恢复,并减少了线性残差。自动化和人工评估证实了学习的特征与临床上有意义的概念(例如,“味觉感觉”或“传染性单核细胞增多症”),而不是频繁但缺乏信息的标记。这些领域特定的SAE捕获相关的线性结构,留下更小、更纯的非线性残差。我们的结论是,域限制减轻了宽域SAE的关键限制,使更完整和可解释的潜在分解,并建议该领域可能需要质疑通用SAE的“基础模型”缩放。
摘要:Sparse autoencoders (SAEs) decompose large language model (LLM) activations into latent features that reveal mechanistic structure. Conventional SAEs train on broad data distributions, forcing a fixed latent budget to capture only high-frequency, generic patterns. This often results in significant linear ``dark matter'' in reconstruction error and produces latents that fragment or absorb each other, complicating interpretation. We show that restricting SAE training to a well-defined domain (medical text) reallocates capacity to domain-specific features, improving both reconstruction fidelity and interpretability. Training JumpReLU SAEs on layer-20 activations of Gemma-2 models using 195k clinical QA examples, we find that domain-confined SAEs explain up to 20\% more variance, achieve higher loss recovery, and reduce linear residual error compared to broad-domain SAEs. Automated and human evaluations confirm that learned features align with clinically meaningful concepts (e.g., ``taste sensations'' or ``infectious mononucleosis''), rather than frequent but uninformative tokens. These domain-specific SAEs capture relevant linear structure, leaving a smaller, more purely nonlinear residual. We conclude that domain-confinement mitigates key limitations of broad-domain SAEs, enabling more complete and interpretable latent decompositions, and suggesting the field may need to question ``foundation-model'' scaling for general-purpose SAEs.


【14】The First Differentiable Transfer-Based Algorithm for Discrete MicroLED Repair
标题:第一个用于离散MicroLED修复的基于差异传输的算法
链接:https://arxiv.org/abs/2508.09206

作者: Lue
备注:15 pages, 7 figures. Presents a differentiable optimization method for laser-enabled MicroLED repair planning, modeling discrete stage shifts in a manufacturing physics context. Includes loss landscape and gradient analyses, with large-array simulation results
摘要:激光支持的选择性转移是高通量microLED制造中的关键工艺,需要能够规划移位序列的计算模型,以最大限度地减少XY平台的运动,并适应整个衬底上不同的优化目标。我们提出了第一个修复算法的基础上设计的微分传输模块模型离散转移平台,同时通过基于梯度的优化保持可训练。与局部邻近搜索算法相比,我们的方法实现了更好的修复性能,并实现了更灵活的目标设计,例如最小化步骤数。与基于强化学习(RL)的方法不同,我们的方法消除了对手工特征提取器的需求,并且训练速度明显更快,从而可以扩展到大型数组。实验表明,在2000 x2000阵列上,传输步骤减少了50%,规划时间不到2分钟。该方法为加速AR/VR和下一代显示器制造中的microLED修复提供了实用且适用的解决方案。
摘要:Laser-enabled selective transfer, a key process in high-throughput microLED fabrication, requires computational models that can plan shift sequences to minimize motion of XY stages and adapt to varying optimization objectives across the substrate. We propose the first repair algorithm based on a differentiable transfer module designed to model discrete shifts of transfer platforms, while remaining trainable via gradient-based optimization. Compared to local proximity searching algorithms, our approach achieves superior repair performance and enables more flexible objective designs, such as minimizing the number of steps. Unlike reinforcement learning (RL)-based approaches, our method eliminates the need for handcrafted feature extractors and trains significantly faster, allowing scalability to large arrays. Experiments show a 50% reduction in transfer steps and sub-2-minute planning time on 2000x2000 arrays. This method provides a practical and adaptable solution for accelerating microLED repair in AR/VR and next-generation display fabrication.


【15】Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Research
标题:建筑更安全的工地:建筑安全研究的大规模多层次数据集
链接:https://arxiv.org/abs/2508.09203

作者:u, Dawei Li, Zhen Tan, Wenlin Li, Huan Liu, Siyuan Song
备注:The paper was accepted on the CIKM 2025
摘要:施工安全研究是土木工程的一个关键领域,旨在通过对现场条件和人为因素的分析来降低风险和预防伤害。然而,现有建筑安全数据集数量有限且缺乏多样性,这对进行深入分析构成了重大挑战。为了解决这一研究空白,本文介绍了建筑安全数据集(CSDataset),这是一个组织良好的综合性多层次数据集,包括来自职业安全与健康管理局(OSHA)的事件,检查和违规记录。该数据集独特地将结构化属性与非结构化叙述集成在一起,促进了由机器学习和大型语言模型驱动的各种方法。我们还使用我们的数据集进行初步方法基准测试和各种跨层次分析,提供见解,为建筑安全提供信息并加强未来的工作。例如,我们发现,投诉驱动的检查与后续事件的可能性降低17.3%相关。我们的数据集和代码发布在https://github.com/zhenhuiou/Construction-Safety-Dataset-CSDataset上。
摘要 :Construction safety research is a critical field in civil engineering, aiming to mitigate risks and prevent injuries through the analysis of site conditions and human factors. However, the limited volume and lack of diversity in existing construction safety datasets pose significant challenges to conducting in-depth analyses. To address this research gap, this paper introduces the Construction Safety Dataset (CSDataset), a well-organized comprehensive multi-level dataset that encompasses incidents, inspections, and violations recorded sourced from the Occupational Safety and Health Administration (OSHA). This dataset uniquely integrates structured attributes with unstructured narratives, facilitating a wide range of approaches driven by machine learning and large language models. We also conduct a preliminary approach benchmarking and various cross-level analyses using our dataset, offering insights to inform and enhance future efforts in construction safety. For example, we found that complaint-driven inspections were associated with a 17.3% reduction in the likelihood of subsequent incidents. Our dataset and code are released at https://github.com/zhenhuiou/Construction-Safety-Dataset-CSDataset.


【16】ADT4Coupons: An Innovative Framework for Sequential Coupon Distribution in E-commerce
标题:ADT 4 Coupons:电子商务中顺序优惠券分发的创新框架
链接:https://arxiv.org/abs/2508.09198

作者:Bingzhe Wang, Zhou Chen, Suhan Hu, Yuchao Ma, Qi Qi, Suoyuan Song, Bicheng Jin
摘要:Coupon distribution is a critical marketing strategy used by online platforms to boost revenue and enhance user engagement. Regrettably, existing coupon distribution strategies fall far short of effectively leveraging the complex sequential interactions between platforms and users. This critical oversight, despite the abundance of e-commerce log data, has precipitated a performance plateau. In this paper, we focus on the scene that the platforms make sequential coupon distribution decision multiple times for various users, with each user interacting with the platform repeatedly. Based on this marketing scenario, we propose a novel marketing framework, named Aligned Decision Transformer for Coupons (ADT4Coupons), to directly devise coupon distribution policy for long-term revenue boosting. ADT4Coupons enables optimized online decision-making in a variety of real-world marketing scenarios. It achieves this by seamlessly integrating three key characteristics, general scenarios, sequential modeling with more comprehensive historical data, and efficient iterative updates within a unified framework. Furthermore, empirical results on real-world industrial dataset, alongside public and synthetic datasets demonstrate the superiority of our framework.


【17】Breath as a biomarker: A survey of contact and contactless applications and approaches in respiratory monitoring
标题:呼吸作为生物标志物:呼吸监测中接触式和非接触式应用和方法的调查
链接:https://arxiv.org/abs/2508.09187

作者:a A. Wakili, Babajide J. Asaju, Woosub Jung
备注:None
摘要:Breath analysis has emerged as a critical tool in health monitoring, offering insights into respiratory function, disease detection, and continuous health assessment. While traditional contact-based methods are reliable, they often pose challenges in comfort and practicality, particularly for long-term monitoring. This survey comprehensively examines contact-based and contactless approaches, emphasizing recent advances in machine learning and deep learning techniques applied to breath analysis. Contactless methods, including Wi-Fi Channel State Information and acoustic sensing, are analyzed for their ability to provide accurate, noninvasive respiratory monitoring. We explore a broad range of applications, from single-user respiratory rate detection to multi-user scenarios, user identification, and respiratory disease detection. Furthermore, this survey details essential data preprocessing, feature extraction, and classification techniques, offering comparative insights into machine learning/deep learning models suited to each approach. Key challenges like dataset scarcity, multi-user interference, and data privacy are also discussed, along with emerging trends like Explainable AI, federated learning, transfer learning, and hybrid modeling. By synthesizing current methodologies and identifying open research directions, this survey offers a comprehensive framework to guide future innovations in breath analysis, bridging advanced technological capabilities with practical healthcare applications.


【18】DQT: Dynamic Quantization Training via Dequantization-Free Nested Integer Arithmetic
标题:DQT:通过无去量化嵌套量化算法进行动态量化训练
链接:https://arxiv.org/abs/2508.09176

作者:ham Yousef Shalby, Fabrizio Pittorino, Francesca Palermo, Diana Trojaniello, Manuel Roveri
摘要:The deployment of deep neural networks on resource-constrained devices relies on quantization. While static, uniform quantization applies a fixed bit-width to all inputs, it fails to adapt to their varying complexity. Dynamic, instance-based mixed-precision quantization promises a superior accuracy-efficiency trade-off by allocating higher precision only when needed. However, a critical bottleneck remains: existing methods require a costly dequantize-to-float and requantize-to-integer cycle to change precision, breaking the integer-only hardware paradigm and compromising performance gains. This paper introduces Dynamic Quantization Training (DQT), a novel framework that removes this bottleneck. At the core of DQT is a nested integer representation where lower-precision values are bit-wise embedded within higher-precision ones. This design, coupled with custom integer-only arithmetic, allows for on-the-fly bit-width switching through a near-zero-cost bit-shift operation. This makes DQT the first quantization framework to enable both dequantization-free static mixed-precision of the backbone network, and truly efficient dynamic, instance-based quantization through a lightweight controller that decides at runtime how to quantize each layer. We demonstrate DQT state-of-the-art performance on ResNet18 on CIFAR-10 and ResNet50 on ImageNet. On ImageNet, our 4-bit dynamic ResNet50 achieves 77.00% top-1 accuracy, an improvement over leading static (LSQ, 76.70%) and dynamic (DQNET, 76.94%) methods at a comparable BitOPs budget. Crucially, DQT achieves this with a bit-width transition cost of only 28.3M simple bit-shift operations, a drastic improvement over the 56.6M costly Multiply-Accumulate (MAC) floating-point operations required by previous dynamic approaches - unlocking a new frontier in efficient, adaptive AI.


【19】Multimodal RAG Enhanced Visual Description
标题:多模式RAG增强视觉描述
链接:https://arxiv.org/abs/2508.09170

作者:r Jaiswal, Haiming Liu, Ingo Frommholz
备注:Accepted by ACM CIKM 2025. 5 pages, 2 figures
摘要:Textual descriptions for multimodal inputs entail recurrent refinement of queries to produce relevant output images. Despite efforts to address challenges such as scaling model size and data volume, the cost associated with pre-training and fine-tuning remains substantial. However, pre-trained large multimodal models (LMMs) encounter a modality gap, characterised by a misalignment between textual and visual representations within a common embedding space. Although fine-tuning can potentially mitigate this gap, it is typically expensive and impractical due to the requirement for extensive domain-driven data. To overcome this challenge, we propose a lightweight training-free approach utilising Retrieval-Augmented Generation (RAG) to extend across the modality using a linear mapping, which can be computed efficiently. During inference, this mapping is applied to images embedded by an LMM enabling retrieval of closest textual descriptions from the training set. These textual descriptions, in conjunction with an instruction, cater as an input prompt for the language model to generate new textual descriptions. In addition, we introduce an iterative technique for distilling the mapping by generating synthetic descriptions via the language model facilitating optimisation for standard utilised image description measures. Experimental results on two benchmark multimodal datasets demonstrate significant improvements.


【20】Motif 2.6B Technical Report
标题:Motif 2.6B技术报告
链接:https://arxiv.org/abs/2508.09148

作者:Lim, Sungmin Lee, Dongseok Kim, Eunhwan Park, Hyunbyung Park, Junhyeok Lee, Wai Ting Cheung, Dahye Choi, Jaeheui Her, Jaeyeon Huh, Hanbin Jung, Changjin Kang, Beomgyu Kim, Jihwan Kim, Minjae Kim, Taehwan Kim, Youngrok Kim, Haesol Lee, Jeesoo Lee, Kungyu Lee, Dongpin Oh, Yeongjae Park, Bokki Ryu, Daewon Suh, Dongjoo Weon
摘要:Recent advancements in Large Language Models (LLMs) have revolutionized artificial intelligence, yet developing an effective foundational LLM that balances high performance with computational efficiency remains challenging, especially for emerging research groups. To address this gap, we introduce Motif-2.6B, a 2.6-billion-parameter foundation model designed to democratize advanced LLM capabilities. Motif-2.6B incorporates several innovative architectural enhancements, including Differential Attention and PolyNorm activation functions, which improve long-context comprehension, reduce hallucination, and enhance in-context learning capabilities. We rigorously tested multiple novel architectural components through extensive experimentation to determine the optimal architecture for Motif-2.6B. Comprehensive evaluations demonstrate that Motif-2.6B consistently meets or exceeds the performance of similarly sized state-of-the-art models across diverse benchmarks, showcasing its effectiveness, scalability, and real-world applicability. Through detailed experiments and tailored techniques, Motif-2.6B significantly advances the landscape of efficient, scalable, and powerful foundational LLMs, offering valuable insights and a robust foundation for future research and deployment.


【21】Structured Kernel Regression VAE: A Computationally Efficient Surrogate for GP-VAEs in ICA
标题:结构核回归VAE:ICA中GP-VAE的高效替代算法
链接:https://arxiv.org/abs/2508.09721

作者:Wei, Fu-Hao Deng, Lin-Yong Cui, Yan-Jie Sun
摘要:The interpretability of generative models is considered a key factor in demonstrating their effectiveness and controllability. The generated data are believed to be determined by latent variables that are not directly observable. Therefore, disentangling, decoupling, decomposing, causal inference, or performing Independent Component Analysis (ICA) in the latent variable space helps uncover the independent factors that influence the attributes or features affecting the generated outputs, thereby enhancing the interpretability of generative models. As a generative model, Variational Autoencoders (VAEs) combine with variational Bayesian inference algorithms. Using VAEs, the inverse process of ICA can be equivalently framed as a variational inference process. In some studies, Gaussian processes (GPs) have been introduced as priors for each dimension of latent variables in VAEs, structuring and separating each dimension from temporal or spatial perspectives, and encouraging different dimensions to control various attributes of the generated data. However, GPs impose a significant computational burden, resulting in substantial resource consumption when handling large datasets. Essentially, GPs model different temporal or spatial structures through various kernel functions. Structuring the priors of latent variables via kernel functions-so that different kernel functions model the correlations among sequence points within different latent dimensions-is at the core of achieving disentanglement in VAEs. The proposed Structured Kernel Regression VAE (SKR-VAE) leverages this core idea in a more efficient way, avoiding the costly kernel matrix inversion required in GPs. This research demonstrates that, while maintaining ICA performance, SKR-VAE achieves greater computational efficiency and significantly reduced computational burden compared to GP-VAE.


【22】RadioMamba: Breaking the Accuracy-Efficiency Trade-off in Radio Map Construction via a Hybrid Mamba-UNet
标题:RadioMamba:通过混合Mamba-UNet打破无线电地图构建中的准确性与效率权衡
链接:https://arxiv.org/abs/2508.09140

作者:Jia, Nan Cheng, Xiucheng Wang, Conghao Zhou, Ruijin Sun, Xuemin (Sherman)Shen
摘要:Radio map (RM) has recently attracted much attention since it can provide real-time and accurate spatial channel information for 6G services and applications. However, current deep learning-based methods for RM construction exhibit well known accuracy-efficiency trade-off. In this paper, we introduce RadioMamba, a hybrid Mamba-UNet architecture for RM construction to address the trade-off. Generally, accurate RM construction requires modeling long-range spatial dependencies, reflecting the global nature of wave propagation physics. RadioMamba utilizes a Mamba-Convolutional block where the Mamba branch captures these global dependencies with linear complexity, while a parallel convolutional branch extracts local features. This hybrid design generates feature representations that capture both global context and local detail. Experiments show that RadioMamba achieves higher accuracy than existing methods, including diffusion models, while operating nearly 20 times faster and using only 2.9\% of the model parameters. By improving both accuracy and efficiency, RadioMamba presents a viable approach for real-time intelligent optimization in next generation wireless systems.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/185571
 
444 次点击