机器学习学术速递[9.23]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计82篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 Updating Embeddings for Dynamic Knowledge Graphs
标题：更新动态知识图的嵌入
链接：https://arxiv.org/abs/2109.10896

作者：Christopher Wewer,Florian Lemmerich,Michael Cochez
机构： RWTH Aachen University, Germany, University of Passau, Germany, Vrije Universiteit Amsterdam, the Netherlands, Part of the work was performed while F. Lemmerich and M. Cochez were affiliated, with RWTH Aachen and Fraunhofer FIT, respectively.
摘要：知识图中的数据通常代表现实世界当前状态的一部分。因此，为了保持最新，需要频繁更新图形数据。为了利用来自知识图的信息，许多最先进的机器学习方法使用嵌入技术。这些技术通常计算嵌入，即节点的向量表示，作为主要机器学习算法的输入。如果稍后发生图形更新，特别是在添加或删除节点时，则必须重新进行训练。这是不可取的，因为这需要时间，而且因为使用这些嵌入进行训练的下游模型如果发生显著变化，则必须重新训练。在本文中，我们研究了不需要完全再训练的嵌入更新，并结合覆盖多个用例的真实动态知识图上的各种嵌入模型对其进行评估。我们研究了根据局部信息以最佳方式放置新出现节点的方法，但请注意，这种方法效果不佳。然而，我们发现，如果我们继续对旧嵌入进行训练，并与只对添加和删除的部分进行优化的历元交错，我们可以在链路预测中使用的典型度量方面获得良好的结果。这种性能的获得要比完全再训练快得多，因此可以维护动态知识图的嵌入。
摘要：Data in Knowledge Graphs often represents part of the current state of the real world. Thus, to stay up-to-date the graph data needs to be updated frequently. To utilize information from Knowledge Graphs, many state-of-the-art machine learning approaches use embedding techniques. These techniques typically compute an embedding, i.e., vector representations of the nodes as input for the main machine learning algorithm. If a graph update occurs later on -- specifically when nodes are added or removed -- the training has to be done all over again. This is undesirable, because of the time it takes and also because downstream models which were trained with these embeddings have to be retrained if they change significantly. In this paper, we investigate embedding updates that do not require full retraining and evaluate them in combination with various embedding models on real dynamic Knowledge Graphs covering multiple use cases. We study approaches that place newly appearing nodes optimally according to local information, but notice that this does not work well. However, we find that if we continue the training of the old embedding, interleaved with epochs during which we only optimize for the added and removed parts, we obtain good results in terms of typical metrics used in link prediction. This performance is obtained much faster than with a complete retraining and hence makes it possible to maintain embeddings for dynamic Knowledge Graphs.

【2】 Towards Automatic Bias Detection in Knowledge Graphs
标题：面向知识图的自动偏差检测
链接：https://arxiv.org/abs/2109.10697

作者：Daphna Keidar,Mian Zhong,Ce Zhang,Yash Raj Shrestha,Bibek Paudel
机构： ETH Zürich, Switzerland; , Stanford University, U.S.A.
备注：Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP 2021). Nov 7--11, 2021
摘要：随着最近依赖知识图的社会应用的激增，在基于KG的方法中确保公平性的技术需求变得越来越明显。以前的工作已经证明KG容易受到各种社会偏见的影响，并提出了多种方法来减少这些偏见。然而，在这些研究中，重点一直放在借记技术上，而要借记的关系是由用户手动指定的。由于手动规范本身容易受到人类认知偏差的影响，因此需要一个能够量化和暴露偏差的系统，该系统能够支持关于如何使用debias的更明智的决策。为了解决文献中的这一差距，我们描述了一个基于数值偏差度量的框架，用于识别知识图嵌入中存在的偏差。我们用三种不同的偏差度量说明了该框架在职业预测任务中的作用，并且可以灵活地扩展到进一步的偏差定义和应用。然后，标记为有偏见的关系可以交给决策者，以便在随后的借记时作出判断。
摘要：With the recent surge in social applications relying on knowledge graphs, the need for techniques to ensure fairness in KG based methods is becoming increasingly evident. Previous works have demonstrated that KGs are prone to various social biases, and have proposed multiple methods for debiasing them. However, in such studies, the focus has been on debiasing techniques, while the relations to be debiased are specified manually by the user. As manual specification is itself susceptible to human cognitive bias, there is a need for a system capable of quantifying and exposing biases, that can support more informed decisions on what to debias. To address this gap in the literature, we describe a framework for identifying biases present in knowledge graph embeddings, based on numerical bias metrics. We illustrate the framework with three different bias measures on the task of profession prediction, and it can be flexibly extended to further bias definitions and applications. The relations flagged as biased can then be handed to decision makers for judgement upon subsequent debiasing.

【3】 Solving Large Steiner Tree Problems in Graphs for Cost-Efficient Fiber-To-The-Home Network Expansion
标题：在图中求解大型Steiner树问题以实现低成本的光纤到户网络扩展
链接：https://arxiv.org/abs/2109.10617

作者：Tobias Müller,Kyrill Schmid,Daniëlle Schuman,Thomas Gabor,Markus Friedrich,Marc Geitz
机构：Geitz, Mobile and Distributed Systems Group, LMU Munich, Germany, Telekom Innovation Laboratories, Deutsche Telekom AG, Bonn, Germany
备注：Submitted to ICAART 2022, 10 pages, 18 figures
摘要：光纤到户（FTTH）网络的扩展由于昂贵的挖掘程序造成了高昂的成本。因此，优化规划过程并最大限度地降低土方开挖工程的成本可以节省大量成本。从数学上讲，FTTH网络问题可以描述为一个最小Steiner树问题。尽管Steiner树问题在过去几十年中已经得到了深入的研究，但在新的计算范式和新兴方法的帮助下，它可能会得到进一步的优化。这项工作研究即将到来的技术，如量子退火、模拟退火和自然启发的方法，如进化算法或基于黏菌的优化。此外，我们还研究了分区和简化方法。通过对几个实际问题实例的评估，我们在大多数领域的表现都优于传统的、广泛使用的基线（NetworkX近似解算器）。初始图的预先划分和所提出的基于黏菌的方法对于经济高效的近似特别有价值。量子退火似乎很有希望，但受到可用量子位数量的限制。
摘要：The expansion of Fiber-To-The-Home (FTTH) networks creates high costs due to expensive excavation procedures. Optimizing the planning process and minimizing the cost of the earth excavation work therefore lead to large savings. Mathematically, the FTTH network problem can be described as a minimum Steiner Tree problem. Even though the Steiner Tree problem has already been investigated intensively in the last decades, it might be further optimized with the help of new computing paradigms and emerging approaches. This work studies upcoming technologies, such as Quantum Annealing, Simulated Annealing and nature-inspired methods like Evolutionary Algorithms or slime-mold-based optimization. Additionally, we investigate partitioning and simplifying methods. Evaluated on several real-life problem instances, we could outperform a traditional, widely-used baseline (NetworkX Approximate Solver) on most of the domains. Prior partitioning of the initial graph and the presented slime-mold-based approach were especially valuable for a cost-efficient approximation. Quantum Annealing seems promising, but was limited by the number of available qubits.

【4】 Learning through structure: towards deep neuromorphic knowledge graph embeddings
标题：从结构中学习：走向深层神经形态知识图嵌入
链接：https://arxiv.org/abs/2109.10376

作者：Victor Caceres Chian,Marcel Hildebrandt,Thomas Runkler,Dominik Dold
机构：Siemens AG Technology, Munich, Germany
备注：Accepted for publication at the International Conference on Neuromorphic Computing (ICNC 2021)
摘要：在从分子合成到社会网络分析和推荐系统的许多工业和学术应用中，计算图结构数据的潜在表示是一项普遍存在的学习任务。知识图是与语义Web相关的最流行和最广泛使用的数据表示形式之一。除了以机器可读的格式构造事实知识之外，知识图还充当许多人工智能应用程序的主干，并允许将上下文信息吸收到各种学习算法中。图神经网络试图通过相邻节点之间的消息传递启发式在低维向量空间中编码图结构。近年来，许多不同的图形神经网络结构在许多学习任务中表现出开创性的性能。在这项工作中，我们提出了一种策略，将用于知识图推理的深度图学习体系结构映射到神经形态体系结构。基于随机初始化和未训练（即冻结）图神经网络能够保持局部图结构的认识，我们构造了一个具有浅知识图嵌入模型的冻结神经网络。我们的实验表明，在传统的计算硬件上，这会导致显著的加速和内存减少，同时保持有竞争力的性能水平。此外，我们将冻结结构扩展到尖峰神经网络，引入了一种新的、基于事件的、高度稀疏的知识图嵌入算法，该算法适合在神经形态硬件中实现。
摘要：Computing latent representations for graph-structured data is an ubiquitous learning task in many industrial and academic applications ranging from molecule synthetization to social network analysis and recommender systems. Knowledge graphs are among the most popular and widely used data representations related to the Semantic Web. Next to structuring factual knowledge in a machine-readable format, knowledge graphs serve as the backbone of many artificial intelligence applications and allow the ingestion of context information into various learning algorithms. Graph neural networks attempt to encode graph structures in low-dimensional vector spaces via a message passing heuristic between neighboring nodes. Over the recent years, a multitude of different graph neural network architectures demonstrated ground-breaking performances in many learning tasks. In this work, we propose a strategy to map deep graph learning architectures for knowledge graph reasoning to neuromorphic architectures. Based on the insight that randomly initialized and untrained (i.e., frozen) graph neural networks are able to preserve local graph structures, we compose a frozen neural network with shallow knowledge graph embedding models. We experimentally show that already on conventional computing hardware, this leads to a significant speedup and memory reduction while maintaining a competitive performance level. Moreover, we extend the frozen architecture to spiking neural networks, introducing a novel, event-based and highly sparse knowledge graph embedding algorithm that is suitable for implementation in neuromorphic hardware.

Transformer(1篇)

【1】 Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
标题：高效扩展：来自预训练和微调Transformer的见解
链接：https://arxiv.org/abs/2109.10686

作者：Yi Tay,Mostafa Dehghani,Jinfeng Rao,William Fedus,Samira Abnar,Hyung Won Chung,Sharan Narang,Dani Yogatama,Ashish Vaswani,Donald Metzler
机构：Google Research & DeepMind†
摘要：关于Transformer架构的伸缩行为，仍然存在许多悬而未决的问题。这些扩展决策和发现可能非常关键，因为训练运行通常伴随着相关的计算成本，这会对财务和/或环境产生影响。本文的目标是展示从预训练和微调Transformer中获得的缩放见解。虽然Kaplan等人对Transformer语言模型的缩放行为进行了全面研究，但范围仅限于上游（预训练）损耗。因此，目前尚不清楚这些发现是否会转移到pretrain finetune范式下的下游任务中。本文的主要发现如下：（1）我们表明，除了模型大小外，下游微调的模型形状也很重要，（2）缩放协议在不同的计算区域运行不同，（3）广泛采用的T5基和T5大尺寸是帕累托无效的。为此，我们提出了改进的缩放协议，与广泛采用的T5基础模型相比，我们重新设计的模型实现了类似的下游微调质量，同时参数减少了50%，训练速度加快了40%。我们公开发布了100多个不同T5配置的预训练检查点，以促进未来的研究和分析。
摘要：There remain many open questions pertaining to the scaling behaviour of Transformer architectures. These scaling decisions and findings can be critical, as training runs often come with an associated computational cost which have both financial and/or environmental impact. The goal of this paper is to present scaling insights from pretraining and finetuning Transformers. While Kaplan et al. presents a comprehensive study of the scaling behaviour of Transformer language models, the scope is only on the upstream (pretraining) loss. Therefore, it is still unclear if these set of findings transfer to downstream task within the context of the pretrain-finetune paradigm. The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient. To this end, we present improved scaling protocols whereby our redesigned models achieve similar downstream fine-tuning quality while having 50\% fewer parameters and training 40\% faster compared to the widely adopted T5-base model. We publicly release over 100 pretrained checkpoints of different T5 configurations to facilitate future research and analysis.

GAN|对抗|攻击|生成相关(3篇)

【1】 Exploring Adversarial Examples for Efficient Active Learning in Machine Learning Classifiers
标题：机器学习分类器中有效主动学习的对抗性实例探索
链接：https://arxiv.org/abs/2109.10770

作者：Honggang Yu,Shihfeng Zeng,Teng Zhang,Ing-Chao Lin,Yier Jin
机构：University of Florida, National Cheng Kung University, University of Central Florida
摘要：机器学习研究人员长期以来一直注意到这样一种现象，即当训练样本在底层决策边界附近密集采样时，模型训练过程将更加有效和高效。虽然这种观察已经广泛应用于一系列机器学习安全技术中，但缺乏对观察正确性的理论分析。为了应对这一挑战，我们首先使用对抗式攻击方法对原始训练示例添加特定的扰动，以便生成的示例可以大致位于ML分类器的决策边界上。然后，我们调查主动学习和这些特定训练示例之间的联系。通过对K-NN分类器、核方法以及深度神经网络等多种典型分类器的分析，为观测奠定了理论基础。因此，我们的理论证明通过对抗性示例为更有效的主动学习方法提供了支持，这与以前的工作相反，对抗性示例通常被用作破坏性解决方案。实验结果表明，所建立的理论基础将指导基于对抗实例的更好的主动学习策略。
摘要：Machine learning researchers have long noticed the phenomenon that the model training process will be more effective and efficient when the training samples are densely sampled around the underlying decision boundary. While this observation has already been widely applied in a range of machine learning security techniques, it lacks theoretical analyses of the correctness of the observation. To address this challenge, we first add particular perturbation to original training examples using adversarial attack methods so that the generated examples could lie approximately on the decision boundary of the ML classifiers. We then investigate the connections between active learning and these particular training examples. Through analyzing various representative classifiers such as k-NN classifiers, kernel methods as well as deep neural networks, we establish a theoretical foundation for the observation. As a result, our theoretical proofs provide support to more efficient active learning methods with the help of adversarial examples, contrary to previous works where adversarial examples are often used as destructive solutions. Experimental results show that the established theoretical foundation will guide better active learning strategies based on adversarial examples.

【2】 Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis
标题：彩票假说对联邦学习的后门攻击
链接：https://arxiv.org/abs/2109.10512

作者：Zeyuan Yin,Ye Yuan,Panfeng Guo,Pan Zhou
机构：Huazhong University of Science and Technology
摘要：与数据中心中的服务器相比，联合学习中的边缘设备通常具有更有限的计算和通信资源。最近，先进的模型压缩方法，如彩票假设，已经在联邦学习上实现，以减少模型大小和通信成本。然而，后门攻击可能会危及其在联合学习场景中的实现。恶意边缘设备使用有毒的私有数据训练客户机模型，并将参数上传到中心，在无意的聚合优化后将后门嵌入到全局共享模型中。在推理阶段，带后门的模型将具有特定触发器的样本分类为一个目标类别，而清理样本的推理精度略有下降。在这项工作中，我们实证证明彩票模型与原始稠密模型一样容易受到后门攻击，后门攻击会影响抽取彩票的结构。基于票据之间的相似性，我们为联邦学习提供了一种可行的防御方法，以抵抗各种数据集上的后门攻击。
摘要：Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center. Recently, advanced model compression methods, like the Lottery Ticket Hypothesis, have already been implemented on federated learning to reduce the model size and communication cost. However, Backdoor Attack can compromise its implementation in the federated learning scenario. The malicious edge device trains the client model with poisoned private data and uploads parameters to the center, embedding a backdoor to the global shared model after unwitting aggregative optimization. During the inference phase, the model with backdoors classifies samples with a certain trigger as one target category, while shows a slight decrease in inference accuracy to clean samples. In this work, we empirically demonstrate that Lottery Ticket models are equally vulnerable to backdoor attacks as the original dense models, and backdoor attacks can influence the structure of extracted tickets. Based on tickets' similarities between each other, we provide a feasible defense for federated learning against backdoor attacks on various datasets.

【3】 Generating Compositional Color Representations from Text
标题：从文本生成合成颜色表示
链接：https://arxiv.org/abs/2109.10477

作者：Paridhi Maheshwari,Nihal Jain,Praneetha Vaddamanu,Dhananjay Raut,Shraiysh Vaishay,Vishwa Vinay
机构：Stanford University, Carnegie Mellon University, Adobe Research, Advanced Micro Devices Inc.
备注：Accepted as a full paper at CIKM 2021
摘要：我们考虑跨模态任务产生颜色短语的文本短语。基于图像搜索引擎上的大部分用户查询遵循（属性、对象）结构这一事实，我们提出了一种生成对抗性网络，该网络可生成此类双字符的颜色配置文件。我们设计我们的管道来学习组合——将看到的属性和对象组合成看不见的对的能力。我们从现有的公共资源中提出了一种新的数据集管理管道。我们描述了如何使用图传播技术编译一组感兴趣的短语，然后映射到图像。虽然该数据集专门用于我们对颜色的研究，但该方法可以扩展到其他感兴趣的视觉维度。我们提供了详细的烧蚀研究，从对比学习文献中测试我们的GAN结构的行为和损失函数。我们表明，生成模型实现了比鉴别模型更低的Frechet起始距离，因此可以预测更好地匹配真实图像的颜色轮廓。最后，我们展示了改进的图像检索和分类性能，表明颜色在这些下游任务中起着至关重要的作用。
摘要：We consider the cross-modal task of producing color representations for text phrases. Motivated by the fact that a significant fraction of user queries on an image search engine follow an (attribute, object) structure, we propose a generative adversarial network that generates color profiles for such bigrams. We design our pipeline to learn composition - the ability to combine seen attributes and objects to unseen pairs. We propose a novel dataset curation pipeline from existing public sources. We describe how a set of phrases of interest can be compiled using a graph propagation technique, and then mapped to images. While this dataset is specialized for our investigations on color, the method can be extended to other visual dimensions where composition is of interest. We provide detailed ablation studies that test the behavior of our GAN architecture with loss functions from the contrastive learning literature. We show that the generative model achieves lower Frechet Inception Distance than discriminative ones, and therefore predicts color profiles that better match those from real images. Finally, we demonstrate improved performance in image retrieval and classification, indicating the crucial role that color plays in these downstream tasks.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】 Quantifying Model Predictive Uncertainty with Perturbation Theory
标题：用摄动理论量化模型预测不确定性
链接：https://arxiv.org/abs/2109.10888

作者：Rishabh Singh,Jose C. Principe
机构：Computational NeuroEngineering Lab (CNEL), University of Florida, Gainesville, Florida, USA
备注：16 pages, 12 figures, 4 tables. arXiv admin note: text overlap with arXiv:2103.01374
摘要：我们提出了一个神经网络预测不确定性量化的框架，该框架用高斯再生核希尔BERT空间（RKHS）嵌入中基于物理的模型权重势场表示代替了传统的贝叶斯权重概率密度函数（PDF）概念。这使我们能够利用量子物理学中的微扰理论，在模型的权重-输出关系上建立一个矩分解问题。提取的矩揭示了模型输出局部邻域附近的权重势场的连续正则化程度。这种局部矩很好地代表了PDF尾部，并且与贝叶斯和集成方法或其变体所描述的中心矩相比，提供了显著更高的模型预测不确定性精度。我们表明，这因此能够更好地检测测试数据的错误模型预测，这些数据经历了从模型学习的训练PDF的协变量转移。我们在使用常见失真技术破坏的几个基准数据集上评估了我们的方法与基线不确定性量化方法。我们的方法提供了更高精度和校准的快速模型预测不确定性估计。
摘要：We propose a framework for predictive uncertainty quantification of a neural network that replaces the conventional Bayesian notion of weight probability density function (PDF) with a physics based potential field representation of the model weights in a Gaussian reproducing kernel Hilbert space (RKHS) embedding. This allows us to use perturbation theory from quantum physics to formulate a moment decomposition problem over the model weight-output relationship. The extracted moments reveal successive degrees of regularization of the weight potential field around the local neighborhood of the model output. Such localized moments represent well the PDF tails and provide significantly greater accuracy of the model's predictive uncertainty than the central moments characterized by Bayesian and ensemble methods or their variants. We show that this consequently leads to a better ability to detect false model predictions of test data that has undergone a covariate shift away from the training PDF learned by the model. We evaluate our approach against baseline uncertainty quantification methods on several benchmark datasets that are corrupted using common distortion techniques. Our approach provides fast model predictive uncertainty estimates with much greater precision and calibration.

【2】 Unsupervised Movement Detection in Indoor Positioning Systems
标题：室内定位系统中的非监督运动检测
链接：https://arxiv.org/abs/2109.10757

作者：Jonathan Flossdorf,Anne Meyer,Dmitri Artjuch,Jaques Schneider,Carsten Jentsch
机构：Department of Statistics, TU Dortmund University†Faculty of Mechanical Engineering, TU Dortmund University‡Arnold AG
摘要：近年来，在制造过程中使用室内定位系统变得越来越流行。通常，生产大厅配备有卫星，卫星接收传感器的位置数据，传感器可以固定在部件、载货车或工业卡车上。这使公司能够减少搜索工作并优化单个系统流程。在我们的研究背景下，传感器仅在移动时发送位置信息。但是，各种情况经常会影响数据的不希望发送，例如由于附近的干扰因素。这对整个系统的数据质量、能耗和可靠性都有负面影响。基于此，我们的目标是区分实际运动和不希望发送的信号，由于室内系统在噪声和测量误差方面的敏感性，这尤其具有挑战性。因此，我们提出了两种新的无监督分类算法，适用于这项任务。根据感兴趣的问题，它们要么依赖于基于距离的标准，要么依赖于基于时间的标准，这允许使用所有基本信息。此外，我们还提出了一种方法，将这两种分类结合起来，并在空间生产领域对它们进行聚合。这使我们能够仅使用位置数据生成底层生产大厅的综合地图。除了分析和检测底层运动结构外，用户还受益于更好地理解自己的系统过程，并受益于检测有问题的系统区域，从而更有效地使用定位系统。由于我们所有的方法都是用无监督技术构建的，因此它们在实践中易于应用，并且不需要比定位系统输出数据更多的信息。
摘要：In recent years, the usage of indoor positioning systems for manufacturing processes became increasingly popular. Typically, the production hall is equipped with satellites which receive position data of sensors that can be pinned on components, load carriers or industrial trucks. This enables a company e.g. to reduce search efforts and to optimize individual system processes. In our research context, a sensor only sends position information when it is moved. However, various circumstances frequently affect that data is undesirably sent, e.g. due to disrupting factors nearby. This has a negative impact on the data quality, the energy consumption, and the reliability of the whole system. Motivated by this, we aim to distinguish between actual movements and signals that were undesirably sent which is in particular challenging due to the susceptibility of indoor systems in terms of noise and measuring errors. Therefore, we propose two novel unsupervised classification algorithms suitable for this task. Depending on the question of interest, they rely either on a distance-based or on a time-based criterion, which allows to make use of all essential information. Furthermore, we propose an approach to combine both classifications and to aggregate them on spatial production areas. This enables us to generate a comprehensive map of the underlying production hall with the sole usage of the position data. Aside from the analysis and detection of the underlying movement structure, the user benefits from a better understanding of own system processes and from the detection of problematic system areas which leads to a more efficient usage of positioning systems. Since all our approaches are constructed with unsupervised techniques, they are handily applicable in practice and do not require more information than the output data of the positioning system.

【3】 Unsupervised Contextualized Document Representation
标题：无监督的上下文文档表示
链接：https://arxiv.org/abs/2109.10509

作者：Ankur Gupta,Vivek Gupta
机构：Indian Institute of Technology, Kanpur, School of Computing, University of Utah
备注：9 Pages, 4 Figures, 7 tables, SustaiNLP2021 @ EMNLP-2021
摘要：一些NLP任务需要文本文档的有效表示。Arora等人，2017年证明，单词向量的简单加权平均通常优于神经模型。SCDV（Mekala et al.，2017）通过对预先计算的词向量进行软稀疏聚类，进一步将其从句子扩展到文档。然而，这两种方法都忽略了单词的多义性和上下文特征。在本文中，我们通过提出SCDV+BERT（ctxd）来解决这个问题，这是一种简单有效的无监督表示法，它将基于上下文的BERT（Devlin et al.，2019）的词义消歧嵌入与SCDV软聚类方法相结合。我们表明，在许多分类数据集上，我们的嵌入优于原始SCDV、预训练BERT和其他一些基线。我们还展示了嵌入在其他任务上的有效性，如概念匹配和句子相似性。此外，我们还证明了SCDV+BERT（ctxd）在数据有限且示例很少的情况下优于微调BERT和不同的嵌入方法。
摘要：Several NLP tasks need the effective representation of text documents. Arora et. al., 2017 demonstrate that simple weighted averaging of word vectors frequently outperforms neural models. SCDV (Mekala et. al., 2017) further extends this from sentences to documents by employing soft and sparse clustering over pre-computed word vectors. However, both techniques ignore the polysemy and contextual character of words. In this paper, we address this issue by proposing SCDV+BERT(ctxd), a simple and effective unsupervised representation that combines contextualized BERT (Devlin et al., 2019) based word embedding for word sense disambiguation with SCDV soft clustering approach. We show that our embeddings outperform original SCDV, pre-train BERT, and several other baselines on many classification datasets. We also demonstrate our embeddings effectiveness on other tasks, such as concept matching and sentence similarity. In addition, we show that SCDV+BERT(ctxd) outperforms fine-tune BERT and different embedding approaches in scenarios with limited data and only few shots examples.

【4】 Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules
标题：基于语义保持重写规则的程序等价性自监督学习
链接：https://arxiv.org/abs/2109.10476

作者：Steve Kommrusch,Martin Monperrus,Louis-Noël Pouchet
机构：edu•Martin Monperrus is with KTH Royal Institute of Technology
备注：18 pages
摘要：我们的目标是综合证明两个程序之间的语义等价性的问题，这两个程序由具有复杂符号表达式的语句序列组成。我们提出了一种基于Transformer的神经网络结构，以生成程序对之间等价性的公理证明。我们生成包含标量和向量的表达式，并支持多类型重写规则来证明等价性。为了训练系统，我们开发了一种原始的训练技术，我们称之为自监督样本选择。这种增量训练提高了学习模型的质量、通用性和可扩展性。我们研究了系统生成长度增加的证明的有效性，并演示了Transformer模型如何学习表示复杂且可验证的符号推理。我们的系统S4Eq在10000对程序上实现了97%的验证成功，同时通过设计确保了零误报。
摘要：We target the problem of synthesizing proofs of semantic equivalence between two programs made of sequences of statements with complex symbolic expressions. We propose a neural network architecture based on the transformer to generate axiomatic proofs of equivalence between program pairs. We generate expressions which include scalars and vectors and support multi-typed rewrite rules to prove equivalence. For training the system, we develop an original training technique, which we call self-supervised sample selection. This incremental training improves the quality, generalizability and extensibility of the learned model. We study the effectiveness of the system to generate proofs of increasing length, and we demonstrate how transformer models learn to represent complex and verifiable symbolic reasoning. Our system, S4Eq, achieves 97% proof success on 10,000 pairs of programs while ensuring zero false positives by design.

【5】 Uncertainty-Aware Training for Cardiac Resynchronisation Therapy Response Prediction
标题：心脏再同步治疗反应预测的不确定性意识训练
链接：https://arxiv.org/abs/2109.10641

作者：Tareen Dawood,Chen Chen,Robin Andlauer,Baldeep S. Sidhu,Bram Ruijsink,Justin Gould,Bradley Porter,Mark Elliott,Vishal Mehta,C. Aldo Rinaldi,Esther Puyol-Antón,Reza Razavi,Andrew P. King
机构： School of Biomedical Engineering & Imaging Sciences, King’s College London, UK, Guys and St Thomas’ Hospital, London, UK, BioMedIA Group, Department of Computing, Imperial College London, UK
备注：STACOM 2021 Workshop
摘要：对预测性深度学习（DL）模型的评估超越了传统的性能指标，对于医疗保健等敏感环境中的应用来说变得越来越重要。这类模型可能具有编码和分析大型数据集的能力，但它们往往缺乏全面的可解释性方法，妨碍了临床对预测结果的信任。量化预测的不确定性是提供这种可解释性和增进信任的一种方法。然而，对于如何将此类需求纳入模型的训练中，关注相对较少。在本文中，我们：（i）量化从心脏磁共振图像预测心脏再同步治疗反应的DL模型的数据（任意）和模型（认知）不确定性，以及（ii）提出并执行不确定性感知损失函数的初步调查，该损失函数可用于重新训练现有的基于DL图像的分类模型，以鼓励对正确预测的信心，并降低对错误预测的信心。我们的初步结果是有希望的，显示了真阳性预测的（认知）置信度显著增加，一些证据表明假阴性置信度降低。
摘要：Evaluation of predictive deep learning (DL) models beyond conventional performance metrics has become increasingly important for applications in sensitive environments like healthcare. Such models might have the capability to encode and analyse large sets of data but they often lack comprehensive interpretability methods, preventing clinical trust in predictive outcomes. Quantifying uncertainty of a prediction is one way to provide such interpretability and promote trust. However, relatively little attention has been paid to how to include such requirements into the training of the model. In this paper we: (i) quantify the data (aleatoric) and model (epistemic) uncertainty of a DL model for Cardiac Resynchronisation Therapy response prediction from cardiac magnetic resonance images, and (ii) propose and perform a preliminary investigation of an uncertainty-aware loss function that can be used to retrain an existing DL image-based classification model to encourage confidence in correct predictions and reduce confidence in incorrect predictions. Our initial results are promising, showing a significant increase in the (epistemic) confidence of true positive predictions, with some evidence of a reduction in false negative confidence.

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】 Adaptive Neural Message Passing for Inductive Learning on Hypergraphs
标题：超图归纳学习的自适应神经消息传递
链接：https://arxiv.org/abs/2109.10683

作者：Devanshu Arya,Deepak K. Gupta,Stevan Rudinac,Marcel Worring
机构： Stevan Rudinac and Marcel Worring are with theUniversity of Amsterdam, Deepak Gupta is withIndian Institute of Technology Dhanbad
摘要：图是表示关系数据集并在其中执行推断的最普遍的数据结构。然而，它们只模拟节点之间的成对关系，并不是为编码高阶关系而设计的。超图减轻了这一缺点，超图中的边可以连接任意数量的节点。大多数超图学习方法将超图结构转换为图的结构，然后部署现有的几何深度学习方法。这种转换导致了信息的丢失，以及超图表达能力的次优利用。我们提出了HyperMSG，一种新的超图学习框架，它使用模块化的两级神经消息传递策略在每个超边内和跨超边准确有效地传播信息。HyperMSG通过学习与每个节点的度中心相关联的注意权重来适应数据和任务。这种机制量化了节点的局部和全局重要性，捕获了超图的结构属性。HyperMSG是归纳的，允许对以前看不见的节点进行推理。此外，它是鲁棒的，在广泛的任务和数据集上优于最先进的超图学习方法。最后，我们通过在具有挑战性的多媒体数据集上的详细实验，证明了HyperMSG在学习多模态关系方面的有效性。
摘要：Graphs are the most ubiquitous data structures for representing relational datasets and performing inferences in them. They model, however, only pairwise relations between nodes and are not designed for encoding the higher-order relations. This drawback is mitigated by hypergraphs, in which an edge can connect an arbitrary number of nodes. Most hypergraph learning approaches convert the hypergraph structure to that of a graph and then deploy existing geometric deep learning methods. This transformation leads to information loss, and sub-optimal exploitation of the hypergraph's expressive power. We present HyperMSG, a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges. HyperMSG adapts to the data and task by learning an attention weight associated with each node's degree centrality. Such a mechanism quantifies both local and global importance of a node, capturing the structural properties of a hypergraph. HyperMSG is inductive, allowing inference on previously unseen nodes. Further, it is robust and outperforms state-of-the-art hypergraph learning methods on a wide range of tasks and datasets. Finally, we demonstrate the effectiveness of HyperMSG in learning multimodal relations through detailed experimentation on a challenging multimedia dataset.

强化学习(7篇)

【1】 A Workflow for Offline Model-Free Robotic Reinforcement Learning
标题：一种离线无模型机器人强化学习工作流
链接：https://arxiv.org/abs/2109.10813

作者：Aviral Kumar,Anikait Singh,Stephen Tian,Chelsea Finn,Sergey Levine
机构： UC Berkeley, Stanford University, (∗ Equal Contribution)
备注：CoRL 2021. Project Website: this https URL First two authors contributed equally
摘要：离线强化学习（RL）只利用先前的经验，无需任何在线交互，即可实现学习控制策略。这可以让机器人从大型和多样化的数据集中获得通用技能，而无需任何昂贵或不安全的在线数据收集。尽管最近离线RL的算法有所改进，但将这些方法应用于实际问题已被证明具有挑战性。尽管离线RL方法可以从先前的数据中学习，但在没有在线实际评估学习到的策略的情况下，没有明确且易于理解的过程来做出各种设计选择，从模型架构到算法超参数。在本文中，我们的目标是开发一个使用离线RL的实用工作流，类似于相对容易理解的用于监督学习问题的工作流。为此，我们设计了一套可在离线训练过程中跟踪的指标和条件，并可告知实践者应如何调整算法和模型架构以提高最终性能。我们的工作流程来源于对保守离线RL算法行为的概念性理解和监督学习中的交叉验证。我们在几个模拟机器人学习场景和两个不同的真实机器人上的三个任务中展示了该工作流在无需任何在线调整的情况下产生有效策略的有效性，重点是使用稀疏二进制奖励的原始图像观察来学习操作技能。解释性视频和其他结果可在sites.google.com/view/offline-rl-workflow上找到
摘要：Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process for making various design choices, from model architecture to algorithm hyperparameters, without actually evaluating the learned policies online. In this paper, our aim is to develop a practical workflow for using offline RL analogous to the relatively well-understood workflows for supervised learning problems. To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training, and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance. Our workflow is derived from a conceptual understanding of the behavior of conservative offline RL algorithms and cross-validation in supervised learning. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning, both in several simulated robotic learning scenarios and for three tasks on two distinct real robots, focusing on learning manipulation skills with raw image observations with sparse binary rewards. Explanatory video and additional results can be found at sites.google.com/view/offline-rl-workflow

【2】 Introducing Symmetries to Black Box Meta Reinforcement Learning
标题：将对称性引入黑盒元强化学习
链接：https://arxiv.org/abs/2109.10781

作者：Louis Kirsch,Sebastian Flennerhag,Hado van Hasselt,Abram Friesen,Junhyuk Oh,Yutian Chen
机构： DeepMind, The Swiss AI Lab IDSIA, USI, SUPSI
摘要：元强化学习（RL）试图从环境交互中自动发现新的RL算法。在所谓的黑箱方法中，策略和学习算法由单个神经网络联合表示。这些方法非常灵活，但在推广到新的、看不见的环境方面往往表现不佳。在本文中，我们探讨了对称性在元泛化中的作用。我们展示了最近一种成功的meta-RL方法，即meta学习基于反向传播的学习目标，该方法展示了某些对称性（特别是学习规则的重用，以及对输入和输出排列的不变性），这些对称性在典型的黑盒meta-RL系统中不存在。我们假设这些对称性可以在元概括中发挥重要作用。基于最近在黑箱监督元学习方面的工作，我们开发了一个黑箱元RL系统，展示了这些相同的对称性。我们通过仔细的实验表明，结合这些对称性可以产生更大能力的算法，以推广到看不见的行动和观察空间、任务和环境。
摘要：Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.

【3】 Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
标题：确定性Actor-Critic方法的深度强化学习中的估计误差修正
链接：https://arxiv.org/abs/2109.10736

作者：Baturay Saglam,Enes Duran,Dogan C. Cicek,Furkan B. Mutlu,Suleyman S. Kozat
机构：Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey
备注：Accepted at ICTAI 2021
摘要：在基于价值的深度强化学习方法中，价值函数的近似会导致高估偏差并导致次优策略。我们表明，在旨在克服高估偏差的深层参与者批评方法中，如果代理接收到的强化信号具有高方差，则会产生显著的低估偏差。为了最小化低估，我们引入了一种无参数的、新颖的深度Q学习变体。我们的Q值更新规则通过最大和最小算子的嵌套组合来计算临界目标，从而限制近似值估计，从而结合了裁剪双Q学习和最大最小Q学习背后的概念。我们评估了我们对几个OpenAI健身房连续控制任务套件的修改，改进了每个测试环境中的最新技术。
摘要：In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant. Our Q-value update rule combines the notions behind Clipped Double Q-learning and Maxmin Q-learning by computing the critic objective through the nested combination of maximum and minimum operators to bound the approximate value estimates. We evaluate our modification on the suite of several OpenAI Gym continuous control tasks, improving the state-of-the-art in every environment tested.

【4】 Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning
标题：局部性问题：协作多智能体强化学习的一种可伸缩值分解方法
链接：https://arxiv.org/abs/2109.10632

作者：Roy Zohar,Shie Mannor,Guy Tennenholtz
机构： Hebrew University of Jerusalem, Nvidia Research, Technion Institute of Technology
摘要：协作多agent强化学习（MARL）由于agent数量呈指数级增长的状态空间和动作空间，面临着显著的可伸缩性问题。随着环境规模的扩大，有效的学分分配变得越来越困难，往往导致学习时间不可行。尽管如此，在许多现实环境中，存在简化的底层动态，可以利用这些动态来实现更具可扩展性的解决方案。在这项工作中，我们在保持全球合作的同时，有效地利用了这种地方性结构。我们提出了一种新的、基于价值的多agent算法LOMAQ，该算法在集中训练-分散执行范式中结合了局部奖励。此外，我们还提供了一种直接报酬分解方法，用于在仅提供全局信号时查找这些局部报酬。我们对我们的方法进行了实证测试，结果表明，与其他方法相比，该方法具有良好的可扩展性，显著提高了性能和收敛速度。
摘要：Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. As environments grow in size, effective credit assignment becomes increasingly harder and often results in infeasible learning times. Still, in many real-world settings, there exist simplified underlying dynamics that can be leveraged for more scalable solutions. In this work, we exploit such locality structures effectively whilst maintaining global cooperation. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Centralized Training Decentralized Execution paradigm. Additionally, we provide a direct reward decomposition method for finding these local rewards when only a global signal is provided. We test our method empirically, showing it scales well compared to other methods, significantly improving performance and convergence speed.

【5】 MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning
标题：MEPG：一种用于深度强化学习的最小集成策略梯度框架
链接：https://arxiv.org/abs/2109.10552

作者：Qiang He,Chen Gong,Yuxun Qu,Xiaoyu Chen,Xinwen Hou,Yu Liu
机构： Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences
摘要：集成强化学习（RL）旨在缓解Q-学习中的不稳定性，并学习引入多值和策略功能的鲁棒策略。在本文中，我们考虑寻找一种新颖但简单的集成深度RL算法来解决资源消耗问题。具体而言，我们考虑将多个模型集成到一个模型中。为此，我们提出了\underline{M}inimalist\underline{E}nsemble\underline{P}olicy\underline{G}radient框架（MEPG），它引入了最低限度的集合一致性Bellman更新。我们发现一个价值网络在我们的框架中是足够的。此外，我们从理论上证明了MEPG中的政策评估阶段在数学上等价于一个深高斯过程。为了验证MEPG框架的有效性，我们在gym模拟器上进行了实验，结果表明，MEPG框架与最新的集成方法和无模型方法相匹配或优于这些方法，并且没有额外的计算资源开销。
摘要：Ensemble reinforcement learning (RL) aims to mitigate instability in Q-learning and to learn a robust policy, which introduces multiple value and policy functions. In this paper, we consider finding a novel but simple ensemble Deep RL algorithm to solve the resource consumption issue. Specifically, we consider integrating multiple models into a single model. To this end, we propose the \underline{M}inimalist \underline{E}nsemble \underline{P}olicy \underline{G}radient framework (MEPG), which introduces minimalist ensemble consistent Bellman update. And we find one value network is sufficient in our framework. Moreover, we theoretically show that the policy evaluation phase in the MEPG is mathematically equivalent to a deep Gaussian Process. To verify the effectiveness of the MEPG framework, we conduct experiments on the gym simulator, which show that the MEPG framework matches or outperforms the state-of-the-art ensemble methods and model-free methods without additional computational resource costs.

【6】 Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning
标题：深度强化学习的标杆换道决策
链接：https://arxiv.org/abs/2109.10490

作者：Junjie Wang,Qichao Zhang,Dongbin Zhao
机构：. The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese, Academy of Sciences, Beijing, China;, . College of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
备注：10 pages, 5 figures, 3 tables
摘要：近年来，自主驾驶的发展引起了人们的广泛关注，对自主驾驶的性能进行评价是非常必要的。然而，在公路上进行测试既昂贵又低效。虚拟测试是验证和验证自驾汽车的主要方法，虚拟测试的基础是建立仿真场景。在本文中，我们从深度强化学习的角度提出了一个车道变换任务的训练、测试和评估管道。首先，我们设计了用于训练和测试的车道变换场景，其中测试场景包括随机和确定性部分。然后，我们部署了一组由学习和非学习方法组成的基准。我们在设计的训练场景中训练了几种最先进的深度强化学习方法，并在测试场景中提供了训练模型的基准度量评估结果。设计的车道变换场景和基准都是开放的，以便为车道变换任务提供一致的实验环境。
摘要：The development of autonomous driving has attracted extensive attention in recent years, and it is essential to evaluate the performance of autonomous driving. However, testing on the road is expensive and inefficient. Virtual testing is the primary way to validate and verify self-driving cars, and the basis of virtual testing is to build simulation scenarios. In this paper, we propose a training, testing, and evaluation pipeline for the lane-changing task from the perspective of deep reinforcement learning. First, we design lane change scenarios for training and testing, where the test scenarios include stochastic and deterministic parts. Then, we deploy a set of benchmarks consisting of learning and non-learning approaches. We train several state-of-the-art deep reinforcement learning methods in the designed training scenarios and provide the benchmark metrics evaluation results of the trained models in the test scenarios. The designed lane-changing scenarios and benchmarks are both opened to provide a consistent experimental environment for the lane-changing task.

【7】 Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach
标题：在线二部匹配的深度策略：一种强化学习方法
链接：https://arxiv.org/abs/2109.10380

作者：Mohammad Ali Alomrani,Reza Moravej,Elias B. Khalil
机构：Department of Electrical & Computer Engineering, University of Toronto, Department of Computer Science, Department of Mechanical & Industrial Engineering, SCALE AI Research Chair in Data-Driven Algorithms for Modern Supply Chains
摘要：从向服务器分配计算任务到向用户发布广告，顺序在线匹配问题在许多领域都会出现。在线匹配的挑战在于在未来输入存在不确定性的情况下进行不可撤销的分配。在理论计算机科学文献中，大多数政策本质上都是短视或贪婪的。在经常重复匹配过程的实际应用程序中，可以利用底层数据分布进行更好的决策。我们提出了一个端到端强化学习框架，用于基于历史数据的反复试验得出更好的匹配策略。我们设计了一套神经网络结构，设计了特征表示，并通过两个在线匹配问题对其进行了实证评估：边缘加权在线二部匹配和在线子模块二部匹配。我们发现，在四个合成数据集和真实数据集上，大多数学习方法的性能明显优于经典贪婪算法。我们的代码在https://github.com/lyeskhalil/CORL.git.
摘要：From assigning computing tasks to servers and advertisements to users, sequential online matching problems arise in a wide variety of domains. The challenge in online matching lies in making irrevocable assignments while there is uncertainty about future inputs. In the theoretical computer science literature, most policies are myopic or greedy in nature. In real-world applications where the matching process is repeated on a regular basis, the underlying data distribution can be leveraged for better decision-making. We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We devise a set of neural network architectures, design feature representations, and empirically evaluate them across two online matching problems: Edge-Weighted Online Bipartite Matching and Online Submodular Bipartite Matching. We show that most of the learning approaches perform significantly better than classical greedy algorithms on four synthetic and real-world datasets. Our code is publicly available at https://github.com/lyeskhalil/CORL.git.

医学相关(3篇)

【1】 Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images
标题：用于大规模医学图像自标记的深度变分聚类框架
链接：https://arxiv.org/abs/2109.10777

作者：Farzin Soleymani,Mohammad Eslami,Tobias Elze,Bernd Bischl,Mina Rezaei
机构：Department of Electrical and Computer Engineering, Technical University of Munich, Germany, Massachusetts Eye and Ear Hospital, Harvard Medical School, Boston, MA, USA, Department of Statistics, LMU Munich, Germany
备注：arXiv admin note: text overlap with arXiv:2109.05232
摘要：我们提出了一个深度变分聚类（DVC）框架，用于大规模医学图像的无监督表示学习和聚类。DVC通过概率卷积编码器和概率卷积解码器同时学习多元高斯后验概率分布；并优化集群标签分配。在这里，学习的多元高斯后验可以捕获大量未标记图像的潜在分布。然后，我们使用聚类损失在变化的潜在空间上执行无监督聚类。在这种方法中，概率解码器有助于防止潜在空间中数据点的失真，并保持数据生成分布的局部结构。该训练过程可以看作是一个自我训练过程，以细化潜在空间，同时迭代优化集群分配。我们在代表不同医学成像模式的三个公共数据集上评估了我们提出的框架。我们的实验结果表明，我们提出的框架在不同的数据集上具有更好的通用性。它在几个医学成像基准上取得了令人信服的结果。因此，与传统的深度无监督学习相比，我们的方法在实际应用中具有潜在优势。该方法的源代码和所有实验可在以下网站公开获取：https://github.com/csfarzin/DVC
摘要：We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications. The source code of the method and all the experiments are available publicly at: https://github.com/csfarzin/DVC

【2】 Cramér-Rao bound-informed training of neural networks for quantitative MRI
标题：定量磁共振神经网络的Cramér-Rao边界信息训练
链接：https://arxiv.org/abs/2109.10535

作者：Xiaoxia Zhang,Quentin Duchemin,Kangning Liu,Sebastian Flassbeck,Cem Gultekin,Carlos Fernandez-Granda,Jakob Assländer
备注：Xiaoxia Zhang, Quentin Duchemin, and Kangning Liu contributed equally to this work
摘要：神经网络越来越多地用于定量MRI中的参数估计，特别是在磁共振指纹识别中。与金标准非线性最小二乘拟合相比，它们的优点是速度快，并且对许多拟合问题的非凸性具有免疫力。然而，我们发现，在异构参数空间中，即在估计参数的方差变化很大的空间中，很难实现良好的性能，并且需要对损失函数、超参数和参数空间中训练数据的分布进行艰苦的调整。在这里，我们用一个理论上有充分根据的损失函数来解决这些问题：Cramêer-Rao界（CRB）为无偏估计量的方差提供了一个理论下界，我们建议用相应的CRB对平方误差进行归一化。通过这种归一化，我们平衡了难以估计和不太难估计的参数以及参数空间中的面积的贡献，避免了前者在整体训练损失中的优势。此外，基于CRB的损失函数等于一个最大有效无偏估计，我们考虑理想估计。因此，建议的基于CRB的损失函数提供了一个绝对评估指标。我们将基于CRB的损失训练网络与常用均方误差损失训练网络进行了比较，并在数值、体模和活体实验中展示了前者的优势。
摘要：Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting. Their advantages over the gold standard non-linear least square fitting are their superior speed and their immunity to the non-convexity of many fitting problems. We find, however, that in heterogeneous parameter spaces, i.e. in spaces in which the variance of the estimated parameters varies considerably, good performance is hard to achieve and requires arduous tweaking of the loss function, hyper parameters, and the distribution of the training data in parameter space. Here, we address these issues with a theoretically well-founded loss function: the Cram\'er-Rao bound (CRB) provides a theoretical lower bound for the variance of an unbiased estimator and we propose to normalize the squared error with respective CRB. With this normalization, we balance the contributions of hard-to-estimate and not-so-hard-to-estimate parameters and areas in parameter space, and avoid a dominance of the former in the overall training loss. Further, the CRB-based loss function equals one for a maximally-efficient unbiased estimator, which we consider the ideal estimator. Hence, the proposed CRB-based loss function provides an absolute evaluation metric. We compare a network trained with the CRB-based loss with a network trained with the commonly used means squared error loss and demonstrate the advantages of the former in numerical, phantom, and in vivo experiments.

【3】 Towards The Automatic Coding of Medical Transcripts to Improve Patient-Centered Communication
标题：走向医疗成绩单自动编码，促进以患者为中心的沟通
链接：https://arxiv.org/abs/2109.10514

作者：Gilchan Park,Julia Taylor Rayz,Cleveland G. Shields
机构：Computer and Information Technology,Human Development & Family, Studies, Purdue University
备注：Society for Design and Process Science (SDPS) 2016
摘要：本文旨在提供一种自动编码医患沟通记录的方法，以改善以患者为中心的沟通（PCC）。PCC是高质量医疗的核心部分。为了改善PCC，医生和患者之间的对话已被记录，并用预定义的代码标记。训练有素的人类编码人员已经对转录本进行了手工编码。由于它需要巨大的人工成本，并可能造成人为错误，因此应考虑使用自动编码方法以提高效率和有效性。我们采用了三种机器学习算法（Na \“ive Bayes、Random Forest和Support Vector machine）将文本中的行分类为相应的代码。结果表明，有证据可以区分代码，这被认为足以训练人类注释者。
摘要：This paper aims to provide an approach for automatic coding of physician-patient communication transcripts to improve patient-centered communication (PCC). PCC is a central part of high-quality health care. To improve PCC, dialogues between physicians and patients have been recorded and tagged with predefined codes. Trained human coders have manually coded the transcripts. Since it entails huge labor costs and poses possible human errors, automatic coding methods should be considered for efficiency and effectiveness. We adopted three machine learning algorithms (Na\"ive Bayes, Random Forest, and Support Vector Machine) to categorize lines in transcripts into corresponding codes. The result showed that there is evidence to distinguish the codes, and this is considered to be sufficient for training of human annotators.

聚类(3篇)

【1】 Multi-Slice Clustering for 3-order Tensor Data
标题：三阶张量数据的多切片聚类
链接：https://arxiv.org/abs/2109.10803

作者：Dina Faneva Andriantsiory,Joseph Ben Geloun,Mustapha Lebbah
机构：Laboratoire d’Informatique de Paris Nord (LIPN), Universit´e Sorbonne Paris Nord
摘要：三维数据的几种三聚类方法需要指定每个维度的聚类大小。这带来了一定程度的随意性。为了解决这个问题，我们提出了一种新的方法，即三阶张量数据集的多层聚类（MSC）。我们在每个维度或张量模式下分析每个张量切片的谱分解，即矩阵。因此，我们定义了一个矩阵切片之间的相似性度量，直到一个阈值（精度）参数，并由此识别一个集群。所有部分簇的交集提供了所需的三簇。我们的算法的有效性在合成数据集和真实数据集上都得到了验证。
摘要：Several methods of triclustering of three dimensional data require the specification of the cluster size in each dimension. This introduces a certain degree of arbitrariness. To address this issue, we propose a new method, namely the multi-slice clustering (MSC) for a 3-order tensor data set. We analyse, in each dimension or tensor mode, the spectral decomposition of each tensor slice, i.e. a matrix. Thus, we define a similarity measure between matrix slices up to a threshold (precision) parameter, and from that, identify a cluster. The intersection of all partial clusters provides the desired triclustering. The effectiveness of our algorithm is shown on both synthetic and real-world data sets.

【2】 Diarisation using Location tracking with agglomerative clustering
标题：使用具有凝聚聚类的位置跟踪的离散化
链接：https://arxiv.org/abs/2109.10598

作者：Jeremy H. M. Wong,Igor Abramovski,Xiong Xiao,Yifan Gong
机构：Microsoft, USA
摘要：以前的工作表明，空间位置信息可以补充说话人嵌入的说话人日记任务。然而，所使用的模型通常假设发言者在整个会议期间都相当静止。本文提出放松这一假设，通过在凝聚层次聚类（AHC）分层框架内明确建模说话人的动作。跟踪说话人位置的卡尔曼滤波器用于计算对数似然比，该对数似然比有助于AHC合并和停止决策的聚类亲和力计算。实验表明，与不使用位置信息或作出平稳性假设的方法相比，该方法能够改进Microsoft rich meeting转录任务。
摘要：Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framework. Kalman filters, which track the locations of speakers, are used to compute log-likelihood ratios that contribute to the cluster affinity computations for the AHC merging and stopping decisions. Experiments show that the proposed approach is able to yield improvements on a Microsoft rich meeting transcription task, compared to methods that do not use location information or that make stationarity assumptions.

【3】 High-dimensional Bayesian Optimization for CNN Auto Pruning with Clustering and Rollback
标题：基于聚类和回滚的CNN自动剪枝的高维贝叶斯优化
链接：https://arxiv.org/abs/2109.10591

作者：Jiandong Mu,Hanwei Fan,Wei Zhang
机构： The Hong Kong University of Science and Technology (HKUST), China
备注：7 pages with 1 page for references
摘要：修剪被广泛应用于细化卷积神经网络（CNN）模型，以实现精度和模型大小之间的良好平衡，从而使修剪后的模型适用于功率受限的设备，如手机。此过程可以自动化，以避免昂贵的手工操作，并自动探索较大的修剪空间，从而高效地实现高性能修剪策略。基于强化学习（RL）和贝叶斯优化（BO）的自动剪枝机由于其坚实的理论基础、通用性和高压缩质量，得到了广泛的应用。然而，RL代理的训练时间长，结果方差大，而BO代理在高维设计空间中非常耗时。在这项工作中，我们提出了一种增强的BO代理来获得高维设计空间中自动修剪的显著加速。为此，提出了一种新的聚类算法来降低设计空间的维数，加快搜索过程。然后，提出了一种回退算法来恢复高维设计空间，从而获得更高的修剪精度。我们在ResNet、MobileNet和VGG模型上验证了我们提出的方法，并且我们的实验表明，当修剪非常深的CNN模型时，所提出的方法显著提高了BO的准确性。此外，与基于RL的方法相比，我们的方法实现了更低的方差和更短的时间。
摘要：Pruning has been widely used to slim convolutional neural network (CNN) models to achieve a good trade-off between accuracy and model size so that the pruned models become feasible for power-constrained devices such as mobile phones. This process can be automated to avoid the expensive hand-crafted efforts and to explore a large pruning space automatically so that the high-performance pruning policy can be achieved efficiently. Nowadays, reinforcement learning (RL) and Bayesian optimization (BO)-based auto pruners are widely used due to their solid theoretical foundation, universality, and high compressing quality. However, the RL agent suffers from long training times and high variance of results, while the BO agent is time-consuming for high-dimensional design spaces. In this work, we propose an enhanced BO agent to obtain significant acceleration for auto pruning in high-dimensional design spaces. To achieve this, a novel clustering algorithm is proposed to reduce the dimension of the design space to speedup the searching process. Then, a roll-back algorithm is proposed to recover the high-dimensional design space so that higher pruning accuracy can be obtained. We validate our proposed method on ResNet, MobileNet, and VGG models, and our experiments show that the proposed method significantly improves the accuracy of BO when pruning very deep CNN models. Moreover, our method achieves lower variance and shorter time than the RL-based counterpart.

自动驾驶|车辆|车道检测等(3篇)

【1】 Geo-Context Aware Study of Vision-Based Autonomous Driving Models and Spatial Video Data
标题：基于视觉的自主驾驶模型和空间视频数据的地理上下文感知研究
链接：https://arxiv.org/abs/2109.10895

作者：Suphanut Jamonnak,Ye Zhao,Xinyi Huang,Md Amiruzzaman
机构： Huang are withKent State University, Amiruzzaman is with West Chester University
备注：11 pages, 8 figures, and 1 table. This paper is accepted and to be published in IEEE Transactions on Visualization and Computer Graphics
摘要：基于视觉的深度学习（DL）方法在从大规模众包视频数据集中学习自主驾驶模型方面取得了很大进展。他们经过训练，能够根据车载摄像机采集的视频数据预测瞬时驾驶行为。在本文中，我们开发了一个地理环境感知可视化系统，用于研究自动驾驶模型（ADM）预测以及大规模ADM视频数据。通过结合DL模型性能和地理空间可视化技术，可视化研究与地理环境无缝集成。模型性能度量可以与地图视图上的一组地理空间属性一起研究。用户还可以在城市范围和街道级别的分析中发现和比较多个DL模型的预测行为，以及道路图像和视频内容。因此，该系统为DL模型设计人员提供了一个新的自主驾驶视觉探索平台。用例和领域专家评估表明了可视化系统的实用性和有效性。
摘要：Vision-based deep learning (DL) methods have made great progress in learning autonomous driving models from large-scale crowd-sourced video datasets. They are trained to predict instantaneous driving behaviors from video data captured by on-vehicle cameras. In this paper, we develop a geo-context aware visualization system for the study of Autonomous Driving Model (ADM) predictions together with large-scale ADM video data. The visual study is seamlessly integrated with the geographical environment by combining DL model performance with geospatial visualization techniques. Model performance measures can be studied together with a set of geospatial attributes over map views. Users can also discover and compare prediction behaviors of multiple DL models in both city-wide and street-level analysis, together with road images and video contents. Therefore, the system provides a new visual exploration platform for DL model designers in autonomous driving. Use cases and domain expert evaluation show the utility and effectiveness of the visualization system.

【2】 Early Lane Change Prediction for Automated Driving Systems Using Multi-Task Attention-based Convolutional Neural Networks
标题：基于多任务注意力的卷积神经网络自动驾驶系统早期车道变化预测
链接：https://arxiv.org/abs/2109.10742

作者：Sajjad Mozaffari,Eduardo Arnold,Mehrdad Dianati,Saber Fallah
备注：12 pages, 12 figures
摘要：根据各种道路事故记录，车道变换（LC）是公路驾驶中的一种安全关键操纵。因此，提前可靠地预测此类操纵对于自动驾驶系统的安全舒适运行至关重要。以前的大多数研究依赖于检测已经开始的操纵，而不是提前预测操纵。此外，以前的大多数工作都没有估计操纵的关键时间（例如，穿越时间），这实际上可以为ego车辆的决策提供更有用的信息。为了解决这些缺点，本文提出了一种新的多任务模型来同时估计LC操纵的可能性和车道变换时间（TTLC）。在这两项任务中，基于注意力的卷积神经网络（CNN）被用作驾驶环境鸟瞰图表示的共享特征提取工具。CNN模型中使用的空间注意力通过关注周围环境中最相关的区域来改进特征提取过程。此外，两个新的课程学习方案被用来训练所提出的方法。在现有基准数据集中对所提出的方法进行的广泛评估和比较分析表明，所提出的方法优于最先进的LC预测模型，特别是考虑到长期预测性能。
摘要：Lane change (LC) is one of the safety-critical manoeuvres in highway driving according to various road accident records. Thus, reliably predicting such manoeuvre in advance is critical for the safe and comfortable operation of automated driving systems. The majority of previous studies rely on detecting a manoeuvre that has been already started, rather than predicting the manoeuvre in advance. Furthermore, most of the previous works do not estimate the key timings of the manoeuvre (e.g., crossing time), which can actually yield more useful information for the decision making in the ego vehicle. To address these shortcomings, this paper proposes a novel multi-task model to simultaneously estimate the likelihood of LC manoeuvres and the time-to-lane-change (TTLC). In both tasks, an attention-based convolutional neural network (CNN) is used as a shared feature extractor from a bird's eye view representation of the driving environment. The spatial attention used in the CNN model improves the feature extraction process by focusing on the most relevant areas of the surrounding environment. In addition, two novel curriculum learning schemes are employed to train the proposed approach. The extensive evaluation and comparative analysis of the proposed method in existing benchmark datasets show that the proposed method outperforms state-of-the-art LC prediction models, particularly considering long-term prediction performance.

【3】 Vehicle Behavior Prediction and Generalization Using Imbalanced Learning Techniques
标题：基于非平衡学习技术的车辆行为预测与泛化
链接：https://arxiv.org/abs/2109.10656

作者：Theodor Westny,Erik Frisk,Björn Olofsson
机构： Link¨oping University
备注：Accepted for 2021 IEEE 24th International Conference on Intelligent Transportation Systems (ITSC)
摘要：将基于学习的方法用于车辆行为预测是一个很有前途的研究课题。然而，许多公开可用的数据集受到类分布偏差的影响，如果不加以解决，这将限制学习性能。提出了一种由LSTM自动编码器和SVM分类器组成的交互感知预测模型。此外，还提出了一种非平衡学习技术——多类平衡集成。评价表明，该方法提高了模型性能，提高了分类精度。学习模型的良好泛化特性非常重要，因此，对模型进行泛化研究，其中模型基于不同道路配置产生的不同交通行为的不可见交通数据进行评估。这是通过使用两种不同的公路交通记录来实现的，即公开的NGSIM US-101和I80数据集。此外，还评估了将结构和静态特征编码到学习过程中以改进泛化的方法。结果表明，这些方法在分类和泛化性能方面都有很大的改进。
摘要：The use of learning-based methods for vehicle behavior prediction is a promising research topic. However, many publicly available data sets suffer from class distribution skews which limits learning performance if not addressed. This paper proposes an interaction-aware prediction model consisting of an LSTM autoencoder and SVM classifier. Additionally, an imbalanced learning technique, the multiclass balancing ensemble is proposed. Evaluations show that the method enhances model performance, resulting in improved classification accuracy. Good generalization properties of learned models are important and therefore a generalization study is done where models are evaluated on unseen traffic data with dissimilar traffic behavior stemming from different road configurations. This is realized by using two distinct highway traffic recordings, the publicly available NGSIM US-101 and I80 data sets. Moreover, methods for encoding structural and static features into the learning process for improved generalization are evaluated. The resulting methods show substantial improvements in classification as well as generalization performance.

推理|分析|理解|解释(5篇)

【1】 Causal Inference in Non-linear Time-series usingDeep Networks and Knockoff Counterfactuals
标题：基于深度网络和仿冒反事实的非线性时间序列因果推断
链接：https://arxiv.org/abs/2109.10817

作者：Wasim Ahmad,Maha Shadaydeh,Joachim Denzler
机构：Computer Vision Group, Friedrich Schiller University Jena, German Aerospace Center (DLR), Institute of Data Science, Jena, Germany
备注：None
摘要：估计因果关系对于理解多元时间序列中的复杂相互作用至关重要。变量的非线性耦合是不准确估计因果关系的主要挑战之一。在本文中，我们建议使用深度自回归网络（DeepAR）结合反事实分析来推断多元时间序列中的非线性因果关系。我们使用DeepAR概率预测扩展了Granger因果关系的概念。由于深度网络既不能处理缺失输入也不能处理分布外干预，我们建议使用仿冒框架（Barberand Cand`es，2015）生成干预变量，从而进行反事实概率预测。在给定观察变量的情况下，仿冒样本独立于它们的输出，并且可以在不改变数据基本分布的情况下与其对应变量进行交换。我们在合成的和真实的时间序列数据集上测试我们的方法。总体而言，我们的方法在检测多元时间序列中的非线性因果关系方面优于广泛使用的向量自回归格兰杰因果关系和PCMCI。
摘要：Estimating causal relations is vital in understanding the complex interactions in multivariate time series. Non-linear coupling of variables is one of the major challenges inaccurate estimation of cause-effect relations. In this paper, we propose to use deep autoregressive networks (DeepAR) in tandem with counterfactual analysis to infer nonlinear causal relations in multivariate time series. We extend the concept of Granger causality using probabilistic forecasting with DeepAR. Since deep networks can neither handle missing input nor out-of-distribution intervention, we propose to use the Knockoffs framework (Barberand Cand`es, 2015) for generating intervention variables and consequently counterfactual probabilistic forecasting. Knockoff samples are independent of their output given the observed variables and exchangeable with their counterpart variables without changing the underlying distribution of the data. We test our method on synthetic as well as real-world time series datasets. Overall our method outperforms the widely used vector autoregressive Granger causality and PCMCI in detecting nonlinear causal dependency in multivariate time series.

【2】 A Hierarchical Network-Oriented Analysis of User Participation in Misinformation Spread on WhatsApp
标题：WhatsApp错误信息传播中用户参与的层次化网络分析
链接：https://arxiv.org/abs/2109.10462

作者：Gabriel Peres Nobre,Carlos H. G. Ferreira,Jussara M. Almeida
机构：Departament of Computer Science, Universidade Federal de Minas Gerais, Brazil, Departament of Computing and Systems, Universidade Federal de Ouro Preto, Brazil, Department of Electronics and Telecommunications, Politecnico di Torino, Italy
备注：Paper Accepted in Information Processing & Management, Elsevier
摘要：WhatsApp近年来成为许多国家的一个主要通信平台。尽管WhatsApp只提供一对一和小组对话，但事实证明，WhatsApp能够形成丰富的基础网络，跨越现有小组的边界，具有有利于信息传播的结构性特征。事实上，据报道，WhatsApp被用作误导活动的论坛，在一些国家造成了重大的社会、政治和经济后果。在本文中，我们旨在通过研究连接共享同一内容的用户的网络，补充WhatsApp上错误信息传播的最新研究，这些研究主要集中在内容属性和传播动态上。具体而言，我们通过关注三个角度，对参与错误信息传播的用户进行分层网络描述：个人、WhatsApp组和用户社区，即有意或无意经常共享相同内容的用户分组。通过分析共享和网络拓扑特性，我们的研究为WhatsApp用户如何利用连接不同群体的底层网络在平台上传播错误信息提供了有价值的见解。
摘要：WhatsApp emerged as a major communication platform in many countries in the recent years. Despite offering only one-to-one and small group conversations, WhatsApp has been shown to enable the formation of a rich underlying network, crossing the boundaries of existing groups, and with structural properties that favor information dissemination at large. Indeed, WhatsApp has reportedly been used as a forum of misinformation campaigns with significant social, political and economic consequences in several countries. In this article, we aim at complementing recent studies on misinformation spread on WhatsApp, mostly focused on content properties and propagation dynamics, by looking into the network that connects users sharing the same piece of content. Specifically, we present a hierarchical network-oriented characterization of the users engaged in misinformation spread by focusing on three perspectives: individuals, WhatsApp groups and user communities, i.e., groupings of users who, intentionally or not, share the same content disproportionately often. By analyzing sharing and network topological properties, our study offers valuable insights into how WhatsApp users leverage the underlying network connecting different groups to gain large reach in the spread of misinformation on the platform.

【3】 Towards a Real-Time Facial Analysis System
标题：一种面向实时人脸分析的系统
链接：https://arxiv.org/abs/2109.10393

作者：Bishwo Adhikari,Xingyang Ni,Esa Rahtu,Heikki Huttunen
机构：Tampere University, Finland, Visy Oy, Finland
备注：Accepted in IEEE MMSP 2021
摘要：人脸分析是计算机视觉中一个非常活跃的研究领域，具有许多实际应用。现有的大多数研究都集中于解决一项特定任务并最大限度地提高其性能。对于一个完整的面部分析系统，需要高效地解决这些任务，以确保获得流畅的体验。在这项工作中，我们提出了一个实时人脸分析系统的系统级设计。通过收集用于对象检测、分类和回归的深层神经网络，该系统可以识别出现在相机视图中的每个人的年龄、性别、面部表情和面部相似性。我们研究了单个任务的并行化和相互作用。结果表明，该系统的识别精度与现有方法相当，识别速度满足实时性要求。此外，我们提出了一个多任务网络，用于联合预测前三个属性，即年龄、性别和面部表情。源代码和经过训练的模型可在https://github.com/mahehu/TUT-live-age-estimator.
摘要：Facial analysis is an active research area in computer vision, with many practical applications. Most of the existing studies focus on addressing one specific task and maximizing its performance. For a complete facial analysis system, one needs to solve these tasks efficiently to ensure a smooth experience. In this work, we present a system-level design of a real-time facial analysis system. With a collection of deep neural networks for object detection, classification, and regression, the system recognizes age, gender, facial expression, and facial similarity for each person that appears in the camera view. We investigate the parallelization and interplay of individual tasks. Results on common off-the-shelf architecture show that the system's accuracy is comparable to the state-of-the-art methods, and the recognition speed satisfies real-time requirements. Moreover, we propose a multitask network for jointly predicting the first three attributes, i.e., age, gender, and facial expression. Source code and trained models are available at https://github.com/mahehu/TUT-live-age-estimator.

【4】 Sharp Analysis of Random Fourier Features in Classification
标题：随机傅立叶特征在分类中的锐化分析
链接：https://arxiv.org/abs/2109.10623

作者：Zhu Li
机构：Gatsby Computational Neuroscience Unit, University College London, United Kingdom
摘要：我们研究了支持向量机和logistic回归等Lipschitz连续损失函数的随机Fourier特征分类的理论性质。利用正则性条件，我们首次证明了随机傅立叶特征分类仅使用$\Omega（\sqrt{n}\log n）$特征就可以达到$O（1/\sqrt{n}）$学习率，而不是先前结果建议的$\Omega（n）$特征。我们的研究涵盖了标准特征采样方法，我们减少了所需特征的数量，以及一种与问题相关的采样方法，该方法在保持最佳泛化特性的同时进一步减少了特征的数量。此外，我们证明了在马萨特的低噪声假设下，随机傅立叶特征分类可以获得两种采样方案的快速$O（1/n）$学习率。我们的结果证明了随机傅立叶特征近似在降低计算复杂度方面的潜在有效性（从时间上的$O（n^3）$和空间上的$O（n^2）$分别降低到$O（n^2）$和$O（n\sqrt{n}）$），而无需权衡统计预测精度。此外，我们分析中实现的权衡至少与文献中最坏情况下的最优结果相同，并显著改善了良性规律性条件下的最优结果。
摘要：We study the theoretical properties of random Fourier features classification with Lipschitz continuous loss functions such as support vector machine and logistic regression. Utilizing the regularity condition, we show for the first time that random Fourier features classification can achieve $O(1/\sqrt{n})$ learning rate with only $\Omega(\sqrt{n} \log n)$ features, as opposed to $\Omega(n)$ features suggested by previous results. Our study covers the standard feature sampling method for which we reduce the number of features required, as well as a problem-dependent sampling method which further reduces the number of features while still keeping the optimal generalization property. Moreover, we prove that the random Fourier features classification can obtain a fast $O(1/n)$ learning rate for both sampling schemes under Massart's low noise assumption. Our results demonstrate the potential effectiveness of random Fourier features approximation in reducing the computational complexity (roughly from $O(n^3)$ in time and $O(n^2)$ in space to $O(n^2)$ and $O(n\sqrt{n})$ respectively) without having to trade-off the statistical prediction accuracy. In addition, the achieved trade-off in our analysis is at least the same as the optimal results in the literature under the worst case scenario and significantly improves the optimal results under benign regularity conditions.

【5】 Robust marginalization of baryonic effects for cosmological inference at the field level
标题：场水平宇宙学推断中重子效应的稳健边际化
链接：https://arxiv.org/abs/2109.10360

作者：Francisco Villaescusa-Navarro,Shy Genel,Daniel Angles-Alcazar,David N. Spergel,Yin Li,Benjamin Wandelt,Leander Thiele,Andrina Nicola,Jose Manuel Zorrilla Matilla,Helen Shao,Sultan Hassan,Desika Narayanan,Romeel Dave,Mark Vogelsberger
机构：Department of Astrophysical Sciences, Princeton University, Peyton Hall, Princeton NJ , USA, Center for Computational Astrophysics, Flatiron Institute,th Avenue, New York, NY, USA, Columbia Astrophysics Laboratory, Columbia University, New York, NY, USA
备注：7 pages, 4 figures. Second paper of a series of four. The 2D maps, codes, and network weights used in this paper are publicly available at this https URL
摘要：我们训练神经网络从$（25'，h^{-1}{\rm Mpc}^2$2D地图执行无似然推断，该地图包含骆驼项目数千次水动力模拟的总质量表面密度。我们表明，该网络可以从所有分辨尺度（$\gtrsim 100\，h^{-1}{\rm kpc}$）中提取超出一点函数和功率谱的信息，同时在场水平上对重子物理进行稳健的边缘化：该模型可以推断$\Omega{\rm m}（\pm 4\%）和$\sigma_8（\pm 2.5\%）的值$来自与训练完全不同的模拟。
摘要：We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalization over baryonic physics at the field level: the model can infer the value of $\Omega_{\rm m} (\pm 4\%)$ and $\sigma_8 (\pm 2.5\%)$ from simulations completely different to the ones used to train it.

检测相关(3篇)

【1】 Pix2seq: A Language Modeling Framework for Object Detection
标题：Pix2seq：一种面向目标检测的语言建模框架
链接：https://arxiv.org/abs/2109.10852

作者：Ting Chen,Saurabh Saxena,Lala Li,David J. Fleet,Geoffrey Hinton
机构：Google Research, Brain Team
摘要：本文介绍了一种简单通用的目标检测框架Pix2Seq。与显式集成任务先验知识的现有方法不同，我们简单地将对象检测转换为以观察到的像素输入为条件的语言建模任务。对象描述（如边界框和类标签）表示为离散标记序列，我们训练神经网络感知图像并生成所需序列。我们的方法主要基于这样一种直觉，即如果神经网络知道对象的位置和内容，我们只需要教它如何读出它们。除了使用特定于任务的数据扩充，我们的方法对任务进行了最少的假设，但与高度专业化和优化的检测算法相比，它在具有挑战性的COCO数据集上取得了具有竞争力的结果。
摘要：This paper presents Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

【2】 A Robust Asymmetric Kernel Function for Bayesian Optimization, with Application to Image Defect Detection in Manufacturing Systems
标题：一种稳健的贝叶斯优化非对称核函数及其在制造系统图像缺陷检测中的应用
链接：https://arxiv.org/abs/2109.10898

作者：Areej AlBahar,Inyoung Kim,Xiaowei Yue
机构： and the Department of Industrialand Management Systems Engineering, Kuwait University
摘要：复杂工程系统中的一些响应面函数通常是高度非线性的、未成形的，且评估费用昂贵。为了应对这一挑战，贝叶斯优化（Bayesian optimization）通过对目标函数的后验分布进行顺序设计，是一种用于寻找黑箱函数全局最优解的关键方法。核函数在形成估计函数的后验分布方面起着重要作用。广泛使用的核函数，如径向基函数（RBF），非常容易受到异常值的影响；异常值的存在导致其高斯过程替代模型是零星的。本文提出了一种稳健的核函数——不对称弹性网络径向基函数（AEN-RBF）。评估了它作为核函数的有效性和计算复杂性。与基线RBF核相比，我们从理论上证明了AEN-RBF在温和条件下可以实现更小的均方预测误差。所提出的AEN-RBF核函数也能更快地收敛到全局最优。我们还表明，AEN-RBF核函数对异常值不太敏感，因此提高了相应的高斯过程贝叶斯优化的鲁棒性。通过对合成和实际优化问题的广泛评估，我们表明AEN-RBF优于现有的基准核函数。
摘要：Some response surface functions in complex engineering systems are usually highly nonlinear, unformed, and expensive-to-evaluate. To tackle this challenge, Bayesian optimization, which conducts sequential design via a posterior distribution over the objective function, is a critical method used to find the global optimum of black-box functions. Kernel functions play an important role in shaping the posterior distribution of the estimated function. The widely used kernel function, e.g., radial basis function (RBF), is very vulnerable and susceptible to outliers; the existence of outliers is causing its Gaussian process surrogate model to be sporadic. In this paper, we propose a robust kernel function, Asymmetric Elastic Net Radial Basis Function (AEN-RBF). Its validity as a kernel function and computational complexity are evaluated. When compared to the baseline RBF kernel, we prove theoretically that AEN-RBF can realize smaller mean squared prediction error under mild conditions. The proposed AEN-RBF kernel function can also realize faster convergence to the global optimum. We also show that the AEN-RBF kernel function is less sensitive to outliers, and hence improves the robustness of the corresponding Bayesian optimization with Gaussian processes. Through extensive evaluations carried out on synthetic and real-world optimization problems, we show that AEN-RBF outperforms existing benchmark kernel functions.

【3】 Entropic Issues in Likelihood-Based OOD Detection
标题：基于似然的OOD检测中的熵问题
链接：https://arxiv.org/abs/2109.10794

作者：Anthony L. Caterini,Gabriel Loaiza-Ganem
机构：Layer , AI
备注：NeurIPS Workshop Submission
摘要：由最大似然法训练的深层生成模型仍然是概率推理数据的常用方法。然而，已经观察到，它们可以为分布外（OOD）数据分配比分布内数据更高的可能性，因此对这些可能性值的含义提出疑问。在这项工作中，我们对这一现象提供了一个新的视角，将平均可能性分解为KL散度项和熵项。我们认为后者可以解释上述奇怪的OOD行为，抑制高熵数据集上的似然值。虽然我们的想法很简单，但我们还没有在文献中看到对它的探讨。该分析为基于似然比的OOD检测方法的成功提供了进一步的解释，因为有问题的熵项在预期中抵消了。最后，我们讨论了这一观察结果如何与使用多个支持模型的OOD检测的最新成功相关，而上述分解并不适用于这些模型。
摘要：Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold.

分类|识别(6篇)

【1】 Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data
标题：Coarse2Fine：基于粗粒度标注数据的细粒度文本分类
链接：https://arxiv.org/abs/2109.10856

作者：Dheeraj Mekala,Varun Gangal,Jingbo Shang
机构： Department of Computer Science and Engineering, University of California San Diego, CA, USA, Language Technologies Institute, Carnegie Mellon University, PA, USA, Halıcıo˘glu Data Science Institute, University of California San Diego, CA, USA
备注：Accepted to appear in EMNLP 2021
摘要：现有的文本分类方法主要关注固定标签集，而随着每个标签的样本数量增加，许多实际应用程序需要扩展到新的细粒度类。为了适应这些需求，我们引入了一个新的问题，称为粗粒度到细粒度分类，其目的是对粗注释数据执行细粒度分类。与要求新的细粒度人工注释不同，我们选择利用标签表面名称作为唯一的人工指导，并将丰富的预训练生成语言模型编织到迭代弱监督策略中。具体地说，我们首先提出一个标签条件的微调公式来调整这些生成器以完成我们的任务。此外，我们设计了一个基于粗-细标签约束的正则化目标，该约束源于我们的问题设置，使我们比先前的公式有了进一步的改进。我们的框架使用微调生成模型对伪训练数据进行采样以训练分类器，并对真实的未标记数据进行引导以进行模型细化。在两个真实数据集上进行的大量实验和案例研究表明，其性能优于SOTA零炮分类基线。
摘要：Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases. To accommodate such requirements, we introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data. Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance and weave in rich pre-trained generative language models into the iterative weak supervision strategy. Specifically, we first propose a label-conditioned finetuning formulation to attune these generators for our task. Furthermore, we devise a regularization objective based on the coarse-fine label constraints derived from our problem setting, giving us even further improvements over the prior formulation. Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement. Extensive experiments and case studies on two real-world datasets demonstrate superior performance over SOTA zero-shot classification baselines.

【2】 BFClass: A Backdoor-free Text Classification Framework
标题：BFClass：一种无后门的文本分类框架
链接：https://arxiv.org/abs/2109.10855

作者：Zichao Li,Dheeraj Mekala,Chengyu Dong,Jingbo Shang
机构： Department of Electrical and Computer Engineering, University of California San Diego, CA, USA, Department of Computer Science and Engineering, University of California San Diego, CA, USA
备注：Accepted to appear in Findings of EMNLP 2021
摘要：后门攻击通过注入触发器和修改标签毒害训练数据的子集，从而在模型中引入人为漏洞。人们已经探索了各种各样的触发器设计策略来攻击文本分类器，然而，防御这种攻击仍然是一个开放的问题。在这项工作中，我们提出了BFClass，一种新的有效的无后门文本分类训练框架。BFClass的主干是一个预先训练的鉴别器，用于预测损坏输入中的每个标记是否被一个屏蔽语言模型替换。为了识别触发器，我们利用该鉴别器从每个训练样本中定位最可疑的标记，然后通过考虑它们与特定标签的关联强度来提取一个简明的标记集。为了识别中毒子集，我们检查训练样本，将这些已识别的触发器作为最可疑的标记，并检查移除触发器是否会改变中毒模型的预测。大量的实验表明，BFClass可以识别所有的触发器，去除95%的中毒训练样本，错误警报非常有限，并且实现了与在良性训练数据上训练的模型几乎相同的性能。
摘要：Backdoor attack introduces artificial vulnerabilities into the model by poisoning a subset of the training data via injecting triggers and modifying labels. Various trigger design strategies have been explored to attack text classifiers, however, defending such attacks remains an open problem. In this work, we propose BFClass, a novel efficient backdoor-free training framework for text classification. The backbone of BFClass is a pre-trained discriminator that predicts whether each token in the corrupted input was replaced by a masked language model. To identify triggers, we utilize this discriminator to locate the most suspicious token from each training sample and then distill a concise set by considering their association strengths with particular labels. To recognize the poisoned subset, we examine the training samples with these identified triggers as the most suspicious token, and check if removing the trigger will change the poisoned model's prediction. Extensive experiments demonstrate that BFClass can identify all the triggers, remove 95% poisoned training samples with very limited false alarms, and achieve almost the same performance as the models trained on the benign training data.

【3】 Improved Multi-label Classification with Frequent Label-set Mining and Association
标题：基于频繁标签集挖掘和关联的改进多标签分类
链接：https://arxiv.org/abs/2109.10797

作者：Anwesha Law,Ashish Ghosh
机构：Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
摘要：多标签（ML）数据同时处理与单个样本关联的多个类。这导致多个类别重复出现，这表明它们之间存在某种相关性。本文探讨了类之间的相关性，以提高现有ML分类器的分类性能。提出了一种新的频繁标签集挖掘方法，从数据的标签集中提取这些相关类。同时考虑了类的共有（CP）和共无（CA）。从ML数据中挖掘的规则被进一步用于将类相关信息合并到现有的ML分类器中。通过使用CP-CA规则的新方法修改ML分类器生成的软分数。本文定义了确定分数和不确定分数的概念，提出的方法旨在借助于确定分数及其对应的CP-CA规则来改进不确定分数。对现有的三个最大似然分类器的十个最大似然数据集进行了实验分析，结果表明，它们的总体性能有了显著提高。
摘要：Multi-label (ML) data deals with multiple classes associated with individual samples at the same time. This leads to the co-occurrence of several classes repeatedly, which indicates some existing correlation among them. In this article, the correlation among classes has been explored to improve the classification performance of existing ML classifiers. A novel approach of frequent label-set mining has been proposed to extract these correlated classes from the label-sets of the data. Both co-presence (CP) and co-absence (CA) of classes have been taken into consideration. The rules mined from the ML data has been further used to incorporate class correlation information into existing ML classifiers. The soft scores generated by an ML classifier are modified through a novel approach using the CP-CA rules. A concept of certain and uncertain scores has been defined here, where the proposed method aims to improve the uncertain scores with the help of the certain scores and their corresponding CP-CA rules. This has been experimentally analysed on ten ML datasets for three ML existing classifiers which shows substantial improvement in their overall performance.

【4】 Natural Typing Recognition vis Surface Electromyography
标题：基于表面肌电图的自然分型识别
链接：https://arxiv.org/abs/2109.10743

作者：Michael S. Crouch,Mingde Zheng,Michael S. Eggleston
机构：Data & Devices Group, Nokia Bell Labs, Mountain Ave, Murray Hill, NJ, practice is to record one gesture every , seconds in a highly, controlled environment, not counting pauses between gestures, or setup time. The largest published sEMG datasets contain
摘要：通过使用计算机键盘作为手指记录设备，我们通过表面肌电图（sEMG）构建了现有最大的手势识别数据集，并使用深度学习实现了90%以上的字符级精度，完全从测量的肌肉电位重建键入文本。我们优先考虑肌电信号的时间结构，而不是电极布局的空间结构，使用受实时口语转录所用结构启发的网络结构。我们的体系结构能够识别自然计算机打字的快速运动，这种运动以不规则的间隔发生，并且经常在时间上重叠。在综合降低空间或时间分辨率后，数据集的庞大规模也允许我们研究手势识别，显示实时手势识别所需的系统功能。
摘要：By using a computer keyboard as a finger recording device, we construct the largest existing dataset for gesture recognition via surface electromyography (sEMG), and use deep learning to achieve over 90% character-level accuracy on reconstructing typed text entirely from measured muscle potentials. We prioritize the temporal structure of the EMG signal instead of the spatial structure of the electrode layout, using network architectures inspired by those used for real-time spoken language transcription. Our architecture recognizes the rapid movements of natural computer typing, which occur at irregular intervals and often overlap in time. The extensive size of our dataset also allows us to study gesture recognition after synthetically downgrading the spatial or temporal resolution, showing the system capabilities necessary for real-time gesture recognition.

【5】 Classification with Nearest Disjoint Centroids
标题：最近不相交质心的分类
链接：https://arxiv.org/abs/2109.10436

作者：Nicolas Fraiman,Zichao Li
摘要：本文提出了一种新的基于最近质心的分类方法，称之为最近不相交质心分类器。我们的方法与最近质心分类器在以下两个方面有所不同：（1）质心是基于不相交的特征子集而不是所有特征定义的；（2）距离是由维数归一化范数而不是欧几里德范数诱导的。我们提供了一些关于我们的方法的理论结果。此外，我们还提出了一种基于自适应k-均值聚类的简单算法，该算法可以发现我们方法中使用的不相交特征子集，并将该算法扩展到特征选择。我们在模拟数据和真实基因表达数据集上评估并比较了我们的方法与其他密切相关的分类器的性能。结果表明，在不同的环境和情况下，我们的方法通过具有较小的误分类率和/或使用较少的特征，能够优于其他竞争分类器。
摘要：In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other closely related classifiers on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.

【6】 A Latent Restoring Force Approach to Nonlinear System Identification
标题：非线性系统辨识的一种潜在恢复力方法
链接：https://arxiv.org/abs/2109.10681

作者：Timothy J. Rogers,Tobias Friis
机构： Department of Mechanical Engineering, University of Sheffield, UK 2Department of Civil Engineering, Technical University of Denmark
备注：18 pages, 11 figures, preprint submitted to Mechanical Systems and Signal Processing
摘要：非线性动态系统的辨识仍然是工程界面临的重大挑战。这项工作提出了一种基于贝叶斯滤波的方法来提取和识别系统中未知非线性项的贡献，这可以看作是恢复力曲面类型方法的另一种观点。为了实现这种识别，非线性恢复力的贡献最初被建模为时间上的高斯过程。将高斯过程转化为状态空间模型，并与系统的线性动态分量相结合。然后，通过滤波和平滑分布的推断，可以提取系统的内部状态和非线性恢复力。在这些状态下，可以构造一个非线性模型。该方法在模拟案例研究和实验基准数据集上都被证明是有效的。
摘要：Identification of nonlinear dynamic systems remains a significant challenge across engineering. This work suggests an approach based on Bayesian filtering to extract and identify the contribution of an unknown nonlinear term in the system which can be seen as an alternative viewpoint on restoring force surface type approaches. To achieve this identification, the contribution which is the nonlinear restoring force is modelled, initially, as a Gaussian process in time. That Gaussian process is converted into a state-space model and combined with the linear dynamic component of the system. Then, by inference of the filtering and smoothing distributions, the internal states of the system and the nonlinear restoring force can be extracted. In possession of these states a nonlinear model can be constructed. The approach is demonstrated to be effective in both a simulated case study and on an experimental benchmark dataset.

优化|敛散性(3篇)

【1】 ENERO: Efficient Real-Time Routing Optimization
标题：Enero：高效实时路由优化
链接：https://arxiv.org/abs/2109.10883

作者：Paul Almasan,Shihan Xiao,Xiangle Cheng,Xiang Shi,Pere Barlet-Ros,Albert Cabellos-Aparicio
机构：∗Barcelona Neural Networking Center, Universitat Politecnica de Catalunya, Spain, †Network Technology Lab., Huawei Technologies Co.,Ltd.
备注：12 pages, 10 figures
摘要：广域网（WAN）是当今社会的关键基础设施。在过去几年中，广域网的网络流量和网络应用程序的数量都有了相当大的增长。为了能够部署紧急网络应用程序（例如，车辆网络、物联网），现有的流量工程（TE）解决方案必须能够实现高性能实时网络运行。此外，TE解决方案必须能够适应动态场景（例如，流量矩阵的变化或拓扑链路故障）。然而，当前的TE技术依赖于手工制作的启发式算法或计算代价高昂的求解器，这不适合高度动态的TE场景。本文提出了一种高效的实时TE引擎Enero。Enero基于两阶段优化过程。在第一种方法中，它利用深度强化学习（DRL）通过生成长期TE策略来优化路由配置。我们将图形神经网络（GNN）集成到DRL代理中，以在动态网络上实现高效的TE。在第二阶段，Enero使用局部搜索算法来改进DRL的解，而不增加优化过程的计算开销。Enero提供了性能下限，使网络运营商能够了解DRL代理的最差性能。我们相信，性能的下限将减轻在现实网络场景中部署基于DRL的解决方案的难度。实验结果表明，对于多达100条边的拓扑，Enero能够在实际动态网络拓扑中平均4.5秒运行。
摘要：Wide Area Networks (WAN) are a key infrastructure in today's society. During the last years, WANs have seen a considerable increase in network's traffic as well as in the number of network applications. To enable the deployment of emergent network applications (e.g., Vehicular networks, Internet of Things), existing Traffic Engineering (TE) solutions must be able to achieve high performance real-time network operation. In addition, TE solutions must be able to adapt to dynamic scenarios (e.g., changes in the traffic matrix or topology link failures). However, current TE technologies rely on hand-crafted heuristics or computationally expensive solvers, which are not suitable for highly dynamic TE scenarios. In this paper we propose Enero, an efficient real-time TE engine. Enero is based on a two-stage optimization process. In the first one, it leverages Deep Reinforcement Learning (DRL) to optimize the routing configuration by generating a long-term TE strategy. We integrated a Graph Neural Network (GNN) into the DRL agent to enable efficient TE on dynamic networks. In the second stage, Enero uses a Local Search algorithm to improve DRL's solution without adding computational overhead to the optimization process. Enero offers a lower bound in performance, enabling the network operator to know the worst-case performance of the DRL agent. We believe that the lower bound in performance will lighten the path of deploying DRL-based solutions in real-world network scenarios. The experimental results indicate that Enero is able to operate in real-world dynamic network topologies in 4.5 seconds on average for topologies up to 100 edges.

【2】 Learning by Examples Based on Multi-level Optimization
标题：基于多级优化的样例学习
链接：https://arxiv.org/abs/2109.10824

作者：Shentong Mo,Pengtao Xie
机构：Carnegie Mellon University, Pittsburgh, PA , University of California, San Diego, La Jolla, CA
摘要：举例学习是人类学习中一种有效的学习方法，它通过观察相似问题是如何解决的来学习解决新问题。当学生学习一个新主题时，他/她会找出与该新主题相似的范例主题，并学习范例主题以加深对该新主题的理解。我们的目的是研究这种强大的学习技能是否也可以从人类身上借来改进机器学习。在这项工作中，我们提出了一种新的学习方法，称为示例学习（LBE）。我们的方法自动检索一组类似于查询示例的训练示例，并通过使用检索到的示例的类标签来预测查询示例的标签。我们提出了一个三级优化框架来制定LBE，其中包括三个学习阶段：学习连体网络以检索相似的示例；通过利用检索到的相似示例的类标签，学习匹配网络对查询示例进行预测；通过最小化验证损失，学习训练示例之间的“基本事实”相似性。我们开发了一种有效的算法来解决LBE问题，并在各种基准上进行了大量的实验，结果证明了我们的方法在有监督和Few-Shot学习方面的有效性。
摘要：Learning by examples, which learns to solve a new problem by looking into how similar problems are solved, is an effective learning method in human learning. When a student learns a new topic, he/she finds out exemplar topics that are similar to this new topic and studies the exemplar topics to deepen the understanding of the new topic. We aim to investigate whether this powerful learning skill can be borrowed from humans to improve machine learning as well. In this work, we propose a novel learning approach called Learning By Examples (LBE). Our approach automatically retrieves a set of training examples that are similar to query examples and predicts labels for query examples by using class labels of the retrieved examples. We propose a three-level optimization framework to formulate LBE which involves three stages of learning: learning a Siamese network to retrieve similar examples; learning a matching network to make predictions on query examples by leveraging class labels of retrieved similar examples; learning the ``ground-truth'' similarities between training examples by minimizing the validation loss. We develop an efficient algorithm to solve the LBE problem and conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method on both supervised and few-shot learning.

【3】 Differentiable Scaffolding Tree for Molecular Optimization
标题：用于分子优化的可微支架树
链接：https://arxiv.org/abs/2109.10469

作者：Tianfan Fu,Wenhao Gao,Cao Xiao,Jacob Yasonik,Connor W. Coley,Jimeng Sun
机构：Georgia Institute of Technology,Massachusetts Institute of Technology, Amplitude. ,University of Illinois at Urbana-Champaign, ∗Equal Contributions
摘要：功能分子的结构设计，也称为分子优化，是一项重要的化学科学和工程任务，具有重要的应用，如药物发现。深度生成模型和组合优化方法取得了初步成功，但仍然难以直接建模离散化学结构，并且通常严重依赖蛮力枚举。挑战来自分子结构的离散性和不可微性。为了解决这个问题，我们提出了可微支架树（DST），它利用学习到的知识网络将离散的化学结构转换为局部可微的结构。DST通过图神经网络（GNN）反向传播目标属性的导数，实现基于梯度的化学图结构优化。我们的实证研究表明，基于梯度的分子优化是有效的和样本有效的。此外，学习的图形参数还可以提供解释，帮助领域专家理解模型输出。
摘要：The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery. Deep generative models and combinatorial optimization methods achieve initial success but still struggle with directly modeling discrete chemical structures and often heavily rely on brute-force enumeration. The challenge comes from the discrete and non-differentiable nature of molecule structures. To address this, we propose differentiable scaffolding tree (DST) that utilizes a learned knowledge network to convert discrete chemical structures to locally differentiable ones. DST enables a gradient-based optimization on a chemical graph structure by back-propagating the derivatives from the target properties through a graph neural network (GNN). Our empirical studies show the gradient-based molecular optimizations are both effective and sample efficient. Furthermore, the learned graph parameters can also provide an explanation that helps domain experts understand the model output.

预测|估计(5篇)

【1】 The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation
标题：基于非接触式视频生理评估的生命(V4V)挑战的第一个愿景(First Vision First Vision for Vitals(V4V))
链接：https://arxiv.org/abs/2109.10471

作者：Ambareesh Revanur,Zhihua Li,Umur A. Ciftci,Lijun Yin,Laszlo A. Jeni
机构：Robotics Institute, Carnegie Mellon University, Dept. of Computer Science, Binghamton University, L´aszl´o A. Jeni
备注：ICCVw'21. V4V Dataset and Challenge: this https URL
摘要：远程保健有可能抵消在诸如新冠病毒19大流行等公共卫生紧急情况下对帮助的高需求。远程光体积描记术（rPPG）——从视频无创地估计微血管组织中的血容量变化的问题——将非常适合这些情况。在过去几年中，许多研究小组在远程PPG方法方面取得了快速进展，这些方法用于从数字视频中估计心率，并取得了令人印象深刻的结果。这些不同的方法在自然条件下如何比较，在自然条件下，存在自发行为、面部表情和光照变化，这是相对未知的。为了能够比较各种替代方法，Vitals Challenge（V4V）的第1愿景提出了一个新的数据集，其中包含高分辨率视频，时间锁定了来自不同人群的各种生理信号。在本文中，我们概述了评估协议、使用的数据和结果。V4V将与2021年计算机视觉国际会议同时举行。
摘要：Telehealth has the potential to offset the high demand for help during public health emergencies, such as the COVID-19 pandemic. Remote Photoplethysmography (rPPG) - the problem of non-invasively estimating blood volume variations in the microvascular tissue from video - would be well suited for these situations. Over the past few years a number of research groups have made rapid advances in remote PPG methods for estimating heart rate from digital video and obtained impressive results. How these various methods compare in naturalistic conditions, where spontaneous behavior, facial expressions, and illumination changes are present, is relatively unknown. To enable comparisons among alternative methods, the 1st Vision for Vitals Challenge (V4V) presented a novel dataset containing high-resolution videos time-locked with varied physiological signals from a diverse population. In this paper, we outline the evaluation protocol, the data used, and the results. V4V is to be held in conjunction with the 2021 International Conference on Computer Vision.

【2】 Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values
标题：无补偿公平性：一种基于决策树的缺失值公平预测方法
链接：https://arxiv.org/abs/2109.10431

作者：Haewon Jeong,Hao Wang,Flavio P. Calmon
机构：Harvard University
摘要：我们研究了使用缺失值数据训练机器学习模型的公平性问题。尽管文献中有许多公平性干预方法，但大多数方法都需要完整的训练集作为输入。实际上，数据可能会丢失值，数据丢失模式可能取决于组属性（例如性别或种族）。简单地将现成的公平学习算法应用于插补数据集可能会导致不公平的模型。在本文中，我们首先从理论上分析了使用插补数据集进行训练时的不同歧视风险来源。然后，我们提出了一种基于决策树的综合方法，该方法不需要单独的插补和学习过程。相反，我们训练了一棵不需要显式插补的缺失合并为属性（MIA）的树，并优化了公平正则化目标函数。通过对真实数据集的几次实验，我们证明了我们的方法优于应用于插补数据集的现有公平性干预方法。
摘要：We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.

【3】 Physics-informed Neural Networks-based Model Predictive Control for Multi-link Manipulators
标题：基于物理信息神经网络的多连杆机械手模型预测控制
链接：https://arxiv.org/abs/2109.10793

作者：Jonas Nicodemus,Jonas Kneifl,Jörg Fehr,Benjamin Unger
机构：∗ Stuttgart Center for Simulation Science (SC SimTech), University of, Stuttgart, Universit¨atsstr. , Stuttgart, Germany (e-mail:, ∗∗ Institute of Engineering and Computational Mechanics, University of, Stuttgart, Pfaffenwaldring , Stuttgart, Germany (e-mail:
摘要：我们通过物理信息机器学习方法讨论多体动力学的非线性模型预测控制（NMPC）。物理信息神经网络（PINNs）是一种很有前途的逼近（偏）微分方程的工具。Pinn不适合其原始形式的控制任务，因为它们不是为处理可变控制操作或可变初始值而设计的。因此，我们提出了通过添加控制动作和初始条件作为附加网络输入来增强pinn的想法。随后通过采样策略和零保持假设减少高维输入空间。该策略使控制器设计能够基于一个PINN作为底层系统动力学的近似值。另外一个好处是，通过自动微分可以轻松计算灵敏度，从而产生高效的基于梯度的算法。最后，我们使用基于PINN的MPC来解决复杂机械系统多连杆机械手的跟踪问题，给出了我们的结果。
摘要：We discuss nonlinear model predictive control (NMPC) for multi-body dynamics via physics-informed machine learning methods. Physics-informed neural networks (PINNs) are a promising tool to approximate (partial) differential equations. PINNs are not suited for control tasks in their original form since they are not designed to handle variable control actions or variable initial values. We thus present the idea of enhancing PINNs by adding control actions and initial conditions as additional network inputs. The high-dimensional input space is subsequently reduced via a sampling strategy and a zero-hold assumption. This strategy enables the controller design based on a PINN as an approximation of the underlying system dynamics. The additional benefit is that the sensitivities are easily computed via automatic differentiation, thus leading to efficient gradient-based algorithms. Finally, we present our results using our PINN-based MPC to solve a tracking problem for a complex mechanical system, a multi-link manipulator.

【4】 Deep Augmented MUSIC Algorithm for Data-Driven DoA Estimation
标题：数据驱动波达方向估计的深度增广MUSIC算法
链接：https://arxiv.org/abs/2109.10581

作者：Julian P. Merkofer,Guy Revach,Nir Shlezinger,Ruud J. G. van Sloun
机构： Shlezinger is withthe School of ECE, Ben-Gurion University of the Negev
备注：Submitted to ICASSP2022
摘要：波达方向（DoA）估计是传感器阵列信号处理中的一项关键任务，产生了各种成功的基于模型（MB）的算法以及最近发展的数据驱动（DD）方法。在经典多信号分类（MUSIC）算法的基础上，提出了一种新的MB/DD混合DoA估计结构。我们的方法通过专门设计的神经结构增强了原始音乐结构的关键方面，允许它克服纯MB方法的某些限制，例如它无法成功地定位相干源。深度增强MUSIC算法的性能优于其原版，分辨率更高。
摘要：Direction of arrival (DoA) estimation is a crucial task in sensor array signal processing, giving rise to various successful model-based (MB) algorithms as well as recently developed data-driven (DD) methods. This paper introduces a new hybrid MB/DD DoA estimation architecture, based on the classical multiple signal classification (MUSIC) algorithm. Our approach augments crucial aspects of the original MUSIC structure with specifically designed neural architectures, allowing it to overcome certain limitations of the purely MB method, such as its inability to successfully localize coherent sources. The deep augmented MUSIC algorithm is shown to outperform its unaltered version with a superior resolution.

【5】 Learned Benchmarks for Subseasonal Forecasting
标题：亚季节预报的学习基准
链接：https://arxiv.org/abs/2109.10399

作者：Soukayna Mouatadid,Paulo Orenstein,Genevieve Flaspohler,Miruna Oprescu,Judah Cohen,Franklyn Wang,Sean Knight,Maria Geogdzhayeva,Sam Levang,Ernest Fraenkel,Lester Mackey
机构：Department of Computer Science, University of Toronto, Toronto, ON, Canada, Instituto de Matemática Pura e Aplicada, Rio de Janeiro, Brazil, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
备注：15 pages of main paper and 18 pages of appendix text
摘要：我们开发了一个由简单学习的基准模型组成的季度预测工具包，其性能优于操作实践和最先进的机器学习和深度学习方法。我们的新模型包括：（a）气候学++，这是气候学的一种适应性替代方法，对于降水量而言，比美国气候预测业务系统（CFSv2）更精确9%，更熟练250%；（b） CFSv2++，一种习得的CFSv2校正，可将温度和降水精度提高7-8%，技能提高50-275%；（c）Persistence++，这是一种增强的持久性模型，它将CFSv2预测与滞后测量相结合，将温度和降水精度提高6-9%，将skill提高40-130%。在邻近的美国，我们的Climatology++、CFSv2++和Persistence++工具包始终优于标准气象基线、最先进的机器和深度学习方法以及欧洲中期天气预报中心。总的来说，我们发现，通过学习增强功能来增强传统预测方法，可以产生一种有效且计算成本低廉的策略，用于构建下一代的次季节预测基准。
摘要：We develop a subseasonal forecasting toolkit of simple learned benchmark models that outperform both operational practice and state-of-the-art machine learning and deep learning methods. Our new models include (a) Climatology++, an adaptive alternative to climatology that, for precipitation, is 9% more accurate and 250% more skillful than the United States operational Climate Forecasting System (CFSv2); (b) CFSv2++, a learned CFSv2 correction that improves temperature and precipitation accuracy by 7-8% and skill by 50-275%; and (c) Persistence++, an augmented persistence model that combines CFSv2 forecasts with lagged measurements to improve temperature and precipitation accuracy by 6-9% and skill by 40-130%. Across the contiguous U.S., our Climatology++, CFSv2++, and Persistence++ toolkit consistently outperforms standard meteorological baselines, state-of-the-art machine and deep learning methods, and the European Centre for Medium-Range Weather Forecasts ensemble. Overall, we find that augmenting traditional forecasting approaches with learned enhancements yields an effective and computationally inexpensive strategy for building the next generation of subseasonal forecasting benchmarks.

其他神经网络|深度学习|模型|建模(19篇)

【1】 SoK: Machine Learning Governance
标题：SOK：机器学习治理
链接：https://arxiv.org/abs/2109.10870

作者：Varun Chandrasekaran,Hengrui Jia,Anvith Thudi,Adelin Travers,Mohammad Yaghini,Nicolas Papernot
机构：University of Toronto‡, Vector Institute§, University of Wisconsin-Madison†
摘要：机器学习（ML）在计算机系统中的应用不仅给社会带来了许多好处，也带来了风险。在本文中，我们发展了ML治理的概念，以平衡这些利益和风险，目的是实现负责任的ML应用。我们的方法首先将确定数据和模型所有权的研究系统化，从而培养了ML系统特有的身份概念。在此基础上，我们使用身份来保持主体负责ML系统的失败，通过归因和审计。为了增加对ML系统的信任，我们随后调查了用于开发保证的技术，即系统满足其安全要求且不会出现某些已知故障的信心。这使我们强调了对允许模型所有者管理其系统生命周期的技术的需求，例如，修补或停用其ML系统。总而言之，我们的知识系统化标准化了在ML的整个生命周期中参与部署的主体之间的交互。我们强调了未来工作的机会，例如，将ML主体之间的博弈形式化。
摘要：The application of machine learning (ML) in computer systems introduces not only many benefits but also risks to society. In this paper, we develop the concept of ML governance to balance such benefits and risks, with the aim of achieving responsible applications of ML. Our approach first systematizes research towards ascertaining ownership of data and models, thus fostering a notion of identity specific to ML systems. Building on this foundation, we use identities to hold principals accountable for failures of ML systems through both attribution and auditing. To increase trust in ML systems, we then survey techniques for developing assurance, i.e., confidence that the system meets its security requirements and does not exhibit certain known failures. This leads us to highlight the need for techniques that allow a model owner to manage the life cycle of their system, e.g., to patch or retire their ML system. Put altogether, our systematization of knowledge standardizes the interactions between principals involved in the deployment of ML throughout its life cycle. We highlight opportunities for future work, e.g., to formalize the resulting game between ML principals.

【2】 Small-Bench NLP: Benchmark for small single GPU trained models in Natural Language Processing
标题：Small-Bbench NLP：自然语言处理中小型单GPU训练模型的基准
链接：https://arxiv.org/abs/2109.10847

作者：Kamal Raj Kanakarajan,Bhuvana Kundumani,Malaikannan Sankarasubbu
机构：SAAMA AI Research Lab, Chennai, India
摘要：自然语言处理领域的最新进展为我们提供了几种最先进的（SOTA）预训练模型，这些模型可以针对特定任务进行微调。这些大型模型经过数周的大量GPU/TPU训练，拥有数十亿个参数，在基准排行榜上处于领先地位。在本文中，我们讨论了为在单个GPU上训练的具有成本效益和时间效益的小型模型建立基准的必要性。这将使资源受限的研究人员能够在标记化、预训练任务、体系结构、微调方法等方面尝试新颖创新的想法。我们建立了小型工作台NLP，这是在单个GPU上训练的小型高效神经语言模型的基准。小型NLP基准包括公开可用的GLUE数据集上的八个NLP任务和跟踪社区进展的排行榜。我们的ELECTRA DeBERTa（15M参数）小型模型架构的平均得分为81.53，与BERT Base的82.20（110M参数）相当。我们的型号、代码和排行榜可在https://github.com/smallbenchnlp
摘要：Recent progress in the Natural Language Processing domain has given us several State-of-the-Art (SOTA) pretrained models which can be finetuned for specific tasks. These large models with billions of parameters trained on numerous GPUs/TPUs over weeks are leading in the benchmark leaderboards. In this paper, we discuss the need for a benchmark for cost and time effective smaller models trained on a single GPU. This will enable researchers with resource constraints experiment with novel and innovative ideas on tokenization, pretraining tasks, architecture, fine tuning methods etc. We set up Small-Bench NLP, a benchmark for small efficient neural language models trained on a single GPU. Small-Bench NLP benchmark comprises of eight NLP tasks on the publicly available GLUE datasets and a leaderboard to track the progress of the community. Our ELECTRA-DeBERTa (15M parameters) small model architecture achieves an average score of 81.53 which is comparable to that of BERT-Base's 82.20 (110M parameters). Our models, code and leaderboard are available at https://github.com/smallbenchnlp

【3】 Neural network relief: a pruning algorithm based on neural activity
标题：神经网络浮雕：一种基于神经活动的剪枝算法
链接：https://arxiv.org/abs/2109.10795

作者：Aleksandr Dekhovich,David M. J. Tax,Marcel H. F. Sluiter,Miguel A. Bessa
机构：Department of Materials Science and Engineering, Delft University of Technology, David M.J. Tax, Pattern Recognition and Bioinformatics Laboratory, Marcel H.F. Sluiter
摘要：当前的深度神经网络（DNN）是过度参数化的，在每个任务的推理过程中使用了它们的大部分神经元连接。然而，人脑为不同的任务开发出了专门的区域，并利用其一小部分神经元连接进行推理。我们提出了一种迭代剪枝策略，引入了一个简单的重要度评分度量，用于停用不重要的连接，解决DNN中的过度参数化问题，并调整触发模式。目的是找到仍然能够以相当精度解决给定任务的最小连接数，即更简单的子网络。我们在MNIST上实现了与LeNet体系结构相当的性能，并且与CIFAR-10/100和Tiny ImageNet上的VGG和ResNet体系结构的最新算法相比，参数压缩显著更高。我们的方法对于所考虑的两个不同的优化器——Adam和SGD也表现良好。当考虑到当前的硬件和软件实现时，该算法的设计不是为了最小化触发器，尽管与现有技术相比，它的性能是合理的。
摘要：Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered -- Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

【4】 Label Cleaning Multiple Instance Learning: Refining Coarse Annotations on Single Whole-Slide Images
标题：标签清理多实例学习：细化单个完整幻灯片图像上的粗略注释
链接：https://arxiv.org/abs/2109.10778

作者：Zhenzhen Wang,Aleksander S. Popel,Jeremias Sulam
机构： and the Mathematical Institute ofData Science, at Johns Hopkins University
摘要：在病理样本的全幻灯片图像（WSI）中注释癌区域在临床诊断、生物医学研究和机器学习算法开发中起着关键作用。然而，生成详尽而准确的注释是劳动密集型、挑战性和成本高昂的。仅绘制粗略和近似的注释是一项容易得多的任务，成本更低，并且可以减轻病理学家的工作量。在本文中，我们研究了在数字病理学中细化这些近似注释以获得更精确注释的问题。以前的一些工作已经探索了从这些不准确的注释中获取机器学习模型，但是很少有工作能够解决细化问题，即应该明确地识别和纠正错误标记的区域，并且所有这些工作都需要大量的训练样本。我们提出了一种称为标签清理多实例学习（LC-MIL）的方法，在不需要外部训练数据的情况下对单个WSI上的粗略注释进行细化。使用MIL框架联合处理从带有不准确标签的WSI裁剪的补丁，并利用深度注意机制来区分标记错误的实例，减轻其对预测模型的影响并细化分割。我们对乳腺癌淋巴结转移、肝癌和结直肠癌样本的异质性WSI集进行的实验表明，LC-MIL显著细化了粗略注释，优于最先进的替代方法，即使是从单个幻灯片学习。这些结果表明，LC-MIL是一种很有前途的轻量级工具，可以从粗注释的病理集合中提供细粒度注释。
摘要：Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development. However, generating exhaustive and accurate annotations is labor-intensive, challenging, and costly. Drawing only coarse and approximate annotations is a much easier task, less costly, and it alleviates pathologists' workload. In this paper, we study the problem of refining these approximate annotations in digital pathology to obtain more accurate ones. Some previous works have explored obtaining machine learning models from these inaccurate annotations, but few of them tackle the refinement problem where the mislabeled regions should be explicitly identified and corrected, and all of them require a - often very large - number of training samples. We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data. Patches cropped from a WSI with inaccurate labels are processed jointly with a MIL framework, and a deep-attention mechanism is leveraged to discriminate mislabeled instances, mitigating their impact on the predictive model and refining the segmentation. Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming the state-of-the-art alternatives, even while learning from a single slide. These results demonstrate the LC-MIL is a promising, lightweight tool to provide fine-grained annotations from coarsely annotated pathology sets.

【5】 CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks
标题：CC-Cert：一种验证神经网络一般鲁棒性的概率方法
链接：https://arxiv.org/abs/2109.10696

作者：Mikhail Pautov,Nurislam Tursynbek,Marina Munkhoeva,Nikita Muravev,Aleksandr Petiushko,Ivan Oseledets
机构：Skolkovo Institute of Science and Technology,Lomonosov Moscow State University,Huawei Moscow Research Center
摘要：在安全关键的机器学习应用程序中，保护模型免受对抗性攻击是至关重要的，这种攻击是对输入的微小修改，从而改变预测。除了经过严格研究的$\ellp$-有界加性扰动外，最近提出的语义扰动（例如旋转、平移）引起了人们对在现实世界中部署ML系统的严重关注。因此，重要的是为深层学习模型提供可证明的保证，防止语义上有意义的输入转换。在本文中，我们提出了一种新的基于Chernoff-Cramer边界的通用概率认证方法，可用于一般攻击环境。如果攻击是从某个分布中抽样的，我们估计模型失败的概率。我们的理论发现得到了不同数据集实验结果的支持。
摘要：In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks -- small modifications of the input that change the predictions. Besides rigorously studied $\ell_p$-bounded additive perturbations, recently proposed semantic perturbations (e.g. rotation, translation) raise a serious concern on deploying ML systems in real-world. Therefore, it is important to provide provable guarantees for deep learning models against semantically meaningful input transformations. In this paper, we propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds that can be used in general attack settings. We estimate the probability of a model to fail if the attack is sampled from a certain distribution. Our theoretical findings are supported by experimental results on different datasets.

【6】 Decentralized Learning of Tree-Structured Gaussian Graphical Models from Noisy Data
标题：含噪数据中树状结构高斯图形模型的分散学习
链接：https://arxiv.org/abs/2109.10642

作者：Akram Hussain
机构： decentralized (having aThe authors are with the Department of Computer Science and Engineering, Shanghai Jiao Tong University
备注：32 pages, there are more authors of this paper
摘要：研究了树结构高斯图形模型（GGMs）在噪声数据中的分散学习问题。在分散学习中，数据集分布在不同的机器（传感器）上，GGM被广泛用于建模复杂网络，如基因调控网络和社会网络。所提出的分散学习使用Chow-Liu算法来估计树结构的GGM。在以前的工作中，错误树结构恢复概率的上界大多是在没有任何实际噪声的情况下给出的。本文研究了三种常见的噪声信道：高斯信道、擦除信道和二进制对称信道的影响。对于高斯信道情况，为了在恢复$d$节点树结构时满足失效概率上界$\delta>0$，我们提出的定理只需要$\mathcal{O}（\log（\frac{d}{\delta}））$最小样本大小（$n$）的样本，而之前的文献中使用$\mathcal{O}（\log^4（\frac{d}{\delta}））通过使用文献中一些重要著作中使用的正相关系数假设来计算样本数。此外，近似有界高斯随机变量假设没有出现在{Nikolakakis}中。给定一些关于树结构的知识，与公式化边界相比，在小样本量（例如，$<2000$）下，所提出的算法边界将获得明显更好的性能。最后，我们通过对合成数据集进行仿真来验证我们的理论结果。
摘要：This paper studies the decentralized learning of tree-structured Gaussian graphical models (GGMs) from noisy data. In decentralized learning, data set is distributed across different machines (sensors), and GGMs are widely used to model complex networks such as gene regulatory networks and social networks. The proposed decentralized learning uses the Chow-Liu algorithm for estimating the tree-structured GGM. In previous works, upper bounds on the probability of incorrect tree structure recovery were given mostly without any practical noise for simplification. While this paper investigates the effects of three common types of noisy channels: Gaussian, Erasure, and binary symmetric channel. For Gaussian channel case, to satisfy the failure probability upper bound $\delta > 0$ in recovering a $d$-node tree structure, our proposed theorem requires only $\mathcal{O}(\log(\frac{d}{\delta}))$ samples for the smallest sample size ($n$) comparing to the previous literature \cite{Nikolakakis} with $\mathcal{O}(\log^4(\frac{d}{\delta}))$ samples by using the positive correlation coefficient assumption that is used in some important works in the literature. Moreover, the approximately bounded Gaussian random variable assumption does not appear in \cite{Nikolakakis}. Given some knowledge about the tree structure, the proposed Algorithmic Bound will achieve obviously better performance with small sample size (e.g., $< 2000$) comparing with formulaic bounds. Finally, we validate our theoretical results by performing simulations on synthetic data sets.

【7】 Emulating Aerosol Microphysics with a Machine Learning
标题：用机器学习模拟气溶胶微物理
链接：https://arxiv.org/abs/2109.10593

作者：Paula Harder,Duncan Watson-Parris,Dominik Strassel,Nicolas Gauger,Philip Stier,Janis Keuper
机构： UK 5Institute for Machine Learning and Analyt-ics (IMLA), Offenburg University
摘要：气溶胶粒子通过吸收和散射辐射并影响云的性质，在气候系统中发挥着重要作用。它们也是气候建模不确定性的最大来源之一。许多气候模型没有足够详细地包括气溶胶。为了获得更高的精度，必须考虑气溶胶的微物理特性和过程。这是在使用M7微物理模型的ECHAM-HAM全球气候气溶胶模型中完成的，但计算成本的增加使得以更高分辨率或更长时间运行非常昂贵。我们的目标是使用机器学习以足够的精度逼近微观物理模型，并通过快速推理来降低计算成本。原始M7模型用于生成输入-输出对的数据，以在其上训练神经网络。通过使用特殊的对数变换，我们能够了解变量的趋势，平均$R^2$得分为$89\%$。在GPU上，我们实现了120的速度比原来的模式。
摘要：Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail. In order to achieve higher accuracy, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM global climate aerosol model using the M7 microphysics model, but increased computational costs make it very expensive to run at higher resolutions or for a longer time. We aim to use machine learning to approximate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input-output pairs to train a neural network on it. By using a special logarithmic transform we are able to learn the variables tendencies achieving an average $R^2$ score of $89\%$. On a GPU we achieve a speed-up of 120 compared to the original model.

【8】 The Curse Revisited: a Newly Quantified Concept of Meaningful Distances for Learning from High-Dimensional Noisy Data
标题：诅咒重现：从高维噪声数据中学习的有效距离的新量化概念
链接：https://arxiv.org/abs/2109.10569

作者：Robin Vandaele,Bo Kang,Tijl De Bie,Yvan Saeys
机构：Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium, Data mining and Modelling for Biomedicine (DaMBi), VIB Inflammation Research Center, Gent, Belgium
摘要：数据点之间的距离广泛应用于点云表示学习中。然而，在噪声的影响下，这些距离以及基于它们的模型可能会在高维中失去它们的用处，这已经不是什么秘密了。事实上，噪音的微小边缘效应可能会迅速累积，从而使经验上最近和最远的邻居远离地面真相。在本文中，我们使用渐近概率表达式来描述高维数据中的这种效应。此外，尽管之前有人认为，当最远点和最近点之间的相对分辨力较差时，邻域查询变得毫无意义和不稳定，但我们得出结论，当明确地将地面真实数据与噪声分离时，情况并不一定如此。更具体地说，我们推导出，在特定条件下，受噪声影响的经验邻里关系仍然可能是真实的，即使我们观察到这种歧视很差。我们对我们的结果进行了彻底的实证验证，有趣的是，我们的实验表明，邻域随机或不随机的衍生相移与常用降维方法在寻找高维数据的低维表示时表现不佳或良好的相移相同。
摘要：Distances between data points are widely used in point cloud representation learning. Yet, it is no secret that under the effect of noise, these distances-and thus the models based upon them-may lose their usefulness in high dimensions. Indeed, the small marginal effects of the noise may then accumulate quickly, shifting empirical closest and furthest neighbors away from the ground truth. In this paper, we characterize such effects in high-dimensional data using an asymptotic probabilistic expression. Furthermore, while it has been previously argued that neighborhood queries become meaningless and unstable when there is a poor relative discrimination between the furthest and closest point, we conclude that this is not necessarily the case when explicitly separating the ground truth data from the noise. More specifically, we derive that under particular conditions, empirical neighborhood relations affected by noise are still likely to be true even when we observe this discrimination to be poor. We include thorough empirical verification of our results, as well as experiments that interestingly show our derived phase shift where neighbors become random or not is identical to the phase shift where common dimensionality reduction methods perform poorly or well for finding low-dimensional representations of high-dimensional data with dense noise.

【9】 Investigating and Modeling the Dynamics of Long Ties
标题：长纽带的动力学研究与建模
链接：https://arxiv.org/abs/2109.10523

作者：Ding Lyu,Yuan Yuan,Lin Wang,Xiaofan Wang,Alex Pentland
机构：Department of Automation, Shanghai Jiao Tong University, Connection Science, Massachusetts Institute of Technology, Krannert School of Management, Purdue University, Department of Automation, Shanghai University, Media Lab, Massachusetts Institute of Technology
备注：46 pages, 18 figures
摘要：长期关系是连接不同社区的社会关系，被广泛认为在社交网络中传播新信息方面起着至关重要的作用。然而，一些现有的网络理论和预测模型表明，长期关系可能很快消失或最终变得多余，从而使长期关系的长期价值受到质疑。我们对现实世界动态网络的实证分析表明，与这种推理相反，长期关系比其他社会关系更可能持续存在，而且其中许多关系在没有嵌入本地网络的情况下，一直起着社会桥梁的作用。使用一种结合机器学习的新的成本效益分析模型，我们证明了长领带是非常有益的，这会本能地激励人们花费额外的精力来维护它们。这部分解释了为什么长期关系比许多现有理论和模型所建议的更持久。总的来说，我们的研究表明需要社会干预来促进长期关系的形成，比如将不同背景的人混合在一起。
摘要：Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds.

【10】 Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron
标题：Tecnologica Cosa：在Boccaccio的“十日谈”中塑造讲故事的人的性格
链接：https://arxiv.org/abs/2109.10506

作者：A. Feder Cooper,Maria Antoniak,Christopher De Sa,Marilyn Migiel,David Mimno
机构：Bowers College of Computing and Information Science, Cornell University, Department of Romance Studies, Cornell University
备注：The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (co-located with EMNLP 2021)
摘要：我们探索了博卡乔的《十日谈》，以了解数字人文工具如何用于数据有限的任务，而这些任务所使用的语言不再是当代使用的语言：中世纪意大利语。我们把分析的重点放在这个问题上：文本中不同的说书人是否表现出不同的个性？为了回答这个问题，我们根据文本的权威版本策划并发布了一个数据集。我们使用监督分类方法根据说书人讲的故事预测说书人，确认任务的难度，并证明主题建模可以提取主题说书人的“档案”
摘要：We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian. We focus our analysis on the question: Do the different storytellers in the text exhibit distinct personalities? To answer this question, we curate and release a dataset based on the authoritative edition of the text. We use supervised classification methods to predict storytellers based on the stories they tell, confirming the difficulty of the task, and demonstrate that topic modeling can extract thematic storyteller "profiles."

【11】 Scalable and Efficient MoE Training for Multitask Multilingual Models
标题：面向多任务多语言模型的可扩展高效MOE训练
链接：https://arxiv.org/abs/2109.10465

作者：Young Jin Kim,Ammar Ahmad Awan,Alexandre Muzio,Andres Felipe Cruz Salinas,Liyang Lu,Amr Hendy,Samyam Rajbhandari,Yuxiong He,Hany Hassan Awadalla
机构：Microsoft, One Microsoft Way, Redmond, WA , USA
摘要：混合专家（MoE）模型是一类新兴的稀疏激活深度学习模型，其参数具有次线性计算成本。与密集模型相比，MoE的稀疏体系结构提供了大幅增加模型尺寸的机会，同时显著提高了精度，同时消耗了更低的计算预算。然而，支持大规模MoE训练也有其自身的系统和建模挑战。为了克服挑战并抓住MoE的机遇，我们首先开发了一个能够将MoE模型有效扩展到数万亿个参数的系统。它将多维并行和异构内存技术与MoE和谐地结合在一起，与现有工作相比，在同一硬件上实现了8倍大的模型。除了提高系统效率外，我们还提出了新的训练方法来提高MoE样本效率，并利用专家剪枝策略来提高推理时间效率。通过将高效的系统和训练方法相结合，我们能够显著地扩展用于语言生成的大型多任务多语言模型，从而大大提高了模型的准确性。在50种语言上使用100亿个参数训练的模型可以在机器翻译（MT）和多语言自然语言生成任务中实现最先进的性能。高效MoE训练的系统支持已通过DeepSpeed图书馆实施并开源。
摘要：The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunities for drastically growing model size with significant accuracy gain while consuming much lower compute budget. However, supporting large scale MoE training also has its own set of system and modeling challenges. To overcome the challenges and embrace the opportunities of MoE, we first develop a system capable of scaling MoE models efficiently to trillions of parameters. It combines multi-dimensional parallelism and heterogeneous memory technologies harmoniously with MoE to empower 8x larger models on the same hardware compared with existing work. Besides boosting system efficiency, we also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve inference time efficiency. By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy. A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks. The system support of efficient MoE training has been implemented and open-sourced with the DeepSpeed library.

【12】 Selecting Datasets for Evaluating an Enhanced Deep Learning Framework
标题：选择用于评估增强型深度学习框架的数据集
链接：https://arxiv.org/abs/2109.10442

作者：Kudakwashe Dandajena,Isabella M. Venter,Mehrdad Ghaziasgar,Reg Dodds
机构：Department of Computer Science, University of the Western Cape
备注：5 pages, 2 figures, Submitted to SATNAC 2021, Drakensberg, South Africa
摘要：开发了一个框架，以解决现有序列分析技术的局限性。这项工作涉及选择以离散不规则序列模式为特征的合适数据集所遵循的步骤。为了识别、选择、探索和评估从400多篇研究文章中提取的来自不同来源的数据集，采用了四分位距离群值计算方法和定性比劳尔算法，以在此类数据集中提供周期性峰值检测。然后使用最合适的数据集对开发的框架进行测试。研究得出结论，金融市场每日货币兑换域是评估所设计的深度学习框架最合适的数据集，因为它提供了高水平的离散不规则模式。
摘要：A framework was developed to address limitations associated with existing techniques for analysing sequences. This work deals with the steps followed to select suitable datasets characterised by discrete irregular sequential patterns. To identify, select, explore and evaluate which datasets from various sources extracted from more than 400 research articles, an interquartile range method for outlier calculation and a qualitative Billauer's algorithm was adapted to provide periodical peak detection in such datasets. The developed framework was then tested using the most appropriate datasets. The research concluded that the financial market-daily currency exchange domain is the most suitable kind of data set for the evaluation of the designed deep learning framework, as it provides high levels of discrete irregular patterns.

【13】 Imitation Learning of Stabilizing Policies for Nonlinear Systems
标题：非线性系统镇定策略的仿真学习
链接：https://arxiv.org/abs/2109.10854

作者：Sebastian East
机构： East is with the Department of Aerospace Engineering, University ofBristol
摘要：最近有人对模仿学习方法感兴趣，这种方法可以保证产生一个关于已知系统的稳定控制律。该领域的工作通常考虑线性系统和控制器，对于线性系统和控制器，稳定化模仿学习采用双凸优化问题的形式。本文证明了线性系统和控制器的相同方法可以很容易地推广到使用平方和技术的多项式系统和控制器。提出了一种投影梯度下降算法和一种交替方向乘子算法作为求解稳定化模拟学习问题的启发式算法，并通过数值实验验证了其性能。
摘要：There has been a recent interest in imitation learning methods that are guaranteed to produce a stabilizing control law with respect to a known system. Work in this area has generally considered linear systems and controllers, for which stabilizing imitation learning takes the form of a biconvex optimization problem. In this paper it is demonstrated that the same methods developed for linear systems and controllers can be readily extended to polynomial systems and controllers using sum of squares techniques. A projected gradient descent algorithm and an alternating direction method of multipliers algorithm are proposed as heuristics for solving the stabilizing imitation learning problem, and their performance is illustrated through numerical experiments.

【14】 SCSS-Net: Solar Corona Structures Segmentation by Deep Learning
标题：SCSS-Net：基于深度学习的日冕结构分割
链接：https://arxiv.org/abs/2109.10834

作者：Šimon Mackovjak,Martin Harman,Viera Maslej-Krešňáková,Peter Butka
机构：Department of Space Physics, Institute of Experimental Physics, Slovak Academy of Sciences, Košice, Slovakia, Department of Cybernetics and Artificial Intelligence, Technical University of Košice, Košice, Slovakia
备注：accepted for publication in Monthly Notices of the Royal Astronomical Society; for associated code, see this https URL
摘要：日冕结构是可能直接或间接影响地球的空间天气过程的主要驱动因素。由于最新的天基太阳观测站具有连续获取高分辨率图像的能力，日冕中的结构可以多年来以分钟的时间分辨率进行监测。为此，我们开发了一种基于卷积神经网络深度学习方法的EUV光谱中观测到的日冕结构自动分割方法。可用的输入数据集已与我们自己的基于目标结构手动注释的数据集一起检查。事实上，输入数据集是开发模型性能的主要限制。我们的\text{SCSS Net}模型提供了日冕洞和活动区域的结果，可与其他常用的自动分割方法进行比较。更重要的是，它提供了一个通用的程序来识别日冕结构的帮助下转移学习技术。然后，该模型的输出可用于进一步统计研究太阳活动与地球空间天气影响之间的联系。
摘要：Structures in the solar corona are the main drivers of space weather processes that might directly or indirectly affect the Earth. Thanks to the most recent space-based solar observatories, with capabilities to acquire high-resolution images continuously, the structures in the solar corona can be monitored over the years with a time resolution of minutes. For this purpose, we have developed a method for automatic segmentation of solar corona structures observed in EUV spectrum that is based on a deep learning approach utilizing Convolutional Neural Networks. The available input datasets have been examined together with our own dataset based on the manual annotation of the target structures. Indeed, the input dataset is the main limitation of the developed model's performance. Our \textit{SCSS-Net} model provides results for coronal holes and active regions that could be compared with other generally used methods for automatic segmentation. Even more, it provides a universal procedure to identify structures in the solar corona with the help of the transfer learning technique. The outputs of the model can be then used for further statistical studies of connections between solar activity and the influence of space weather on Earth.

【15】 An artificial neural network approach to bifurcating phenomena in computational fluid dynamics
标题：计算流体力学中分叉现象的人工神经网络方法
链接：https://arxiv.org/abs/2109.10765

作者：Federico Pichi,Francesco Ballarin,Gianluigi Rozza,Jan S. Hesthaven
机构： Switzerland 3 Department of Mathematics and Physics, Catholic University of the Sacred Heart
备注：28 pages, 22 figures
摘要：这项工作涉及使用人工神经网络辅助的降阶建模设置研究分叉流体现象。我们讨论了处理非线性参数化偏微分方程非光滑解集的POD-NN方法。因此，我们研究了Navier-Stokes方程，该方程描述：（i）通道中的Coanda效应，以及（ii）在物理/几何多参数设置下的盖子驱动三角形空腔流，考虑了区域结构对分岔点位置的影响。最后，我们提出了一个基于约化流形的分岔图，用于非侵入式恢复临界点演化。利用这种检测工具，即使在高雷诺数条件下，我们也能够有效地获得关于模式流动行为的信息，从对称性破坏剖面到附着/扩展涡。
摘要：This work deals with the investigation of bifurcating fluid phenomena using a reduced order modelling setting aided by artificial neural networks. We discuss the POD-NN approach dealing with non-smooth solutions set of nonlinear parametrized PDEs. Thus, we study the Navier-Stokes equations describing: (i) the Coanda effect in a channel, and (ii) the lid driven triangular cavity flow, in a physical/geometrical multi-parametrized setting, considering the effects of the domain's configuration on the position of the bifurcation points. Finally, we propose a reduced manifold-based bifurcation diagram for a non-intrusive recovery of the critical points evolution. Exploiting such detection tool, we are able to efficiently obtain information about the pattern flow behaviour, from symmetry breaking profiles to attaching/spreading vortices, even at high Reynolds numbers.

【16】 Application of Video-to-Video Translation Networks to Computational Fluid Dynamics
标题：视频到视频转换网络在计算流体力学中的应用
链接：https://arxiv.org/abs/2109.10679

作者：Hiromitsu Kigure
备注：Published in Frontiers in Artificial Intelligence
摘要：近年来，人工智能，特别是深度学习的发展非常引人注目，其在各个领域的应用也迅速增长。在本文中，我报告了生成对抗网络（GAN），特别是视频到视频转换网络在计算流体力学（CFD）模拟中的应用结果。本研究的目的是降低使用GANs进行CFD模拟的计算成本。在本研究中，GANs的结构是图像到图像翻译网络（即所谓的“pix2pix”）和长-短期记忆（LSTM）的组合。结果表明，高成本和高精度模拟（使用高分辨率计算网格）的结果可以从低成本和低精度模拟（使用低分辨率网格）的结果估计出来。特别是，高分辨率网格情况下密度分布的时间演化由低分辨率网格情况下的时间演化通过GANs再现，并且由GANs生成的图像估计的密度不均匀性以良好的精度恢复地面真实。文中还对该方法与几种超分辨率算法的结果进行了定性和定量比较。
摘要：In recent years, the evolution of artificial intelligence, especially deep learning, has been remarkable, and its application to various fields has been growing rapidly. In this paper, I report the results of the application of generative adversarial networks (GANs), specifically video-to-video translation networks, to computational fluid dynamics (CFD) simulations. The purpose of this research is to reduce the computational cost of CFD simulations with GANs. The architecture of GANs in this research is a combination of the image-to-image translation networks (the so-called "pix2pix") and Long Short-Term Memory (LSTM). It is shown that the results of high-cost and high-accuracy simulations (with high-resolution computational grids) can be estimated from those of low-cost and low-accuracy simulations (with low-resolution grids). In particular, the time evolution of density distributions in the cases of a high-resolution grid is reproduced from that in the cases of a low-resolution grid through GANs, and the density inhomogeneity estimated from the image generated by GANs recovers the ground truth with good accuracy. Qualitative and quantitative comparisons of the results of the proposed method with those of several super-resolution algorithms are also presented.

【17】 Identifying Potential Exomoon Signals with Convolutional Neural Networks
标题：用卷积神经网络识别潜在外信号
链接：https://arxiv.org/abs/2109.10503

作者：Alex Teachey,David Kipping
机构：Academia Sinica Institute of Astronomy & Astrophysics, Taipei, Taiwan R.O.C., Department of Astronomy, Columbia University in the City of New York, USA, Center for Computational Astrophysics, Flatiron Institute, Fifth Avenue, New York, NY , USA
备注：14 pages, 13 figures, 1 table. Accepted for publication in Monthly Notices of the Royal Astronomical Society, 15 September 2021
摘要：在可预见的未来，对可能的系外卫星宿主系统进行有针对性的观测仍然很难获得，而且分析起来也很费时。因此，开普勒、K2和TESS等时域勘测将继续发挥关键作用，作为确定候选系外卫星系统的第一步，随后可能会使用地面或太空望远镜。在这项工作中，我们训练卷积神经网络（CNN）来识别开普勒观测到的单次凌日事件中的候选外月信号。我们的训练集包括${\sim}$27000个合成、仅行星和行星+月球单次凌日示例，注入开普勒光曲线。当CNN集合完全一致时，我们使用单个CNN架构实现高达88%的分类精度，在验证集中识别卫星的精度达到97%。然后，我们将CNN集合应用于1880个周期$>10$天的开普勒感兴趣对象的光曲线（$\sim$57000个单独的传输），并通过将行星传输注入每个光曲线进一步测试CNN分类器的准确性，因此，量化剩余恒星活动可能导致假阳性分类的程度。我们发现，这些凌日中有一小部分包含类似月球的信号，尽管我们警告不要从这个结果中强烈推断出外月球的发生率。最后，我们讨论了利用神经网络进行外月球搜索所面临的一些挑战。
摘要：Targeted observations of possible exomoon host systems will remain difficult to obtain and time-consuming to analyze in the foreseeable future. As such, time-domain surveys such as Kepler, K2 and TESS will continue to play a critical role as the first step in identifying candidate exomoon systems, which may then be followed-up with premier ground- or space-based telescopes. In this work, we train an ensemble of convolutional neural networks (CNNs) to identify candidate exomoon signals in single-transit events observed by Kepler. Our training set consists of ${\sim}$27,000 examples of synthetic, planet-only and planet+moon single transits, injected into Kepler light curves. We achieve up to 88\% classification accuracy with individual CNN architectures and 97\% precision in identifying the moons in the validation set when the CNN ensemble is in total agreement. We then apply the CNN ensemble to light curves from 1880 Kepler Objects of Interest with periods $>10$ days ($\sim$57,000 individual transits), and further test the accuracy of the CNN classifier by injecting planet transits into each light curve, thus quantifying the extent to which residual stellar activity may result in false positive classifications. We find a small fraction of these transits contain moon-like signals, though we caution against strong inferences of the exomoon occurrence rate from this result. We conclude by discussing some ongoing challenges to utilizing neural networks for the exomoon search.

【18】 Personalized Online Machine Learning
标题：个性化在线机器学习
链接：https://arxiv.org/abs/2109.10452

作者：Ivana Malenica,Rachael V. Phillips,Romain Pirracchio,Antoine Chambaz,Alan Hubbard,Mark J. van der Laan
机构：Division of Biostatistics, University of California, Berkeley, Department of Anesthesia and Perioperative Care, University of California San Francisco, MAP, (UMR CNRS ,), Universit´e de Paris, Paris, France, Mark van der Laan
摘要：在这项工作中，我们介绍了个性化在线超级学习器（POSL）——一种用于流数据的在线置乱算法，其优化过程适应不同程度的个性化。也就是说，POSL优化了关于基线协变量的预测，因此个性化可以从完全个性化（即，关于基线协变量受试者ID的优化）到许多个体（即，关于公共基线协变量的优化）。作为一种在线算法，POSL实时学习。POSL可以利用多种候选算法，包括具有不同训练和更新时间的在线算法、过程中从不更新的固定算法、从多个个体的时间序列中学习的集合算法以及从单个时间序列中学习的个性化算法。POSL对这种混合基础学习策略的理解取决于收集的数据量、时间序列的平稳性以及一组时间序列的相互特征。本质上，POSL根据数据中的底层（未知）结构，决定是跨样本学习、随时间学习，还是两者都学习。对于反映真实预测场景的广泛模拟，以及在医疗数据应用中，我们检查了POSL相对于其他当前集成和在线学习方法的性能。我们表明，POSL能够为时间序列数据提供可靠的预测，并适应不断变化的数据生成环境。通过将POSL扩展到时间序列按时间顺序动态进入/退出的设置，我们进一步训练了POSL的实用性。
摘要：In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID) to many individuals (i.e., optimization with respect to common baseline covariates). As an online algorithm, POSL learns in real-time. POSL can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed algorithms that are never updated during the procedure, pooled algorithms that learn from many individuals' time-series, and individualized algorithms that learn from within a single time-series. POSL's ensembling of this hybrid of base learning strategies depends on the amount of data collected, the stationarity of the time-series, and the mutual characteristics of a group of time-series. In essence, POSL decides whether to learn across samples, through time, or both, based on the underlying (unknown) structure in the data. For a wide range of simulations that reflect realistic forecasting scenarios, and in a medical data application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for time-series data and adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time-series enter/exit dynamically over chronological time.

【19】 Digital Signal Processing Using Deep Neural Networks
标题：基于深度神经网络的数字信号处理
链接：https://arxiv.org/abs/2109.10404

作者：Brian Shevitski,Yijing Watkins,Nicole Man,Michael Girard
机构：Pacific Northwest National Laboratory, Richland, WA , United States, ⸸these authors contributed equally to this work
摘要：目前，人们对将深度神经网络（DNN）应用于射频（RF）通信的物理层非常感兴趣。在这篇手稿中，我们描述了一个专门设计用于解决射频领域问题的定制DNN。我们的模型通过将自动编码器卷积网络与Transformer网络相结合，利用特征提取和注意机制来完成几个重要的通信网络和数字信号处理（DSP）任务。我们还提出了一种新的开放数据集和物理数据增强模型，该模型能够训练能够执行自动调制分类、推断和校正传输信道效应以及直接解调基带射频信号的DNN。
摘要：Currently there is great interest in the utility of deep neural networks (DNNs) for the physical layer of radio frequency (RF) communications. In this manuscript, we describe a custom DNN specially designed to solve problems in the RF domain. Our model leverages the mechanisms of feature extraction and attention through the combination of an autoencoder convolutional network with a transformer network, to accomplish several important communications network and digital signals processing (DSP) tasks. We also present a new open dataset and physical data augmentation model that enables training of DNNs that can perform automatic modulation classification, infer and correct transmission channel effects, and directly demodulate baseband RF signals.

其他(11篇)

【1】 Recursively Summarizing Books with Human Feedback
标题：利用人的反馈递归地对图书进行摘要
链接：https://arxiv.org/abs/2109.10862

作者：Jeff Wu,Long Ouyang,Daniel M. Ziegler,Nissan Stiennon,Ryan Lowe,Jan Leike,Paul Christiano
机构：Nisan Stiennon∗, OpenAI
摘要：扩展机器学习的一个主要挑战是训练模型来执行对人类来说非常困难或耗时的任务。我们在对整部小说进行抽象总结的任务中介绍了这个问题的进展。我们的方法将从人类反馈中学习与递归任务分解相结合：我们使用在任务较小部分上训练的模型来帮助人类对更广泛的任务进行反馈。我们收集了大量来自人类贴标机的演示和比较，并使用行为克隆和奖励建模对GPT-3进行了微调，以递归方式进行总结。在推理时，该模型首先对书中的小部分进行总结，然后递归地总结这些总结以生成整本书的总结。我们的人类贴标员能够快速监督和评估模型，尽管他们自己没有读过整本书。我们的结果模型生成了完整书籍的合理摘要，甚至在少数情况下（书籍的$\sim5\%$）与人工编写的摘要的质量相匹配。我们在最近的BookSum数据集上实现了最先进的结果，用于图书长度摘要。使用这些摘要的Zero-Shot问答模型在具有挑战性的叙事QA基准上实现了最先进的结果，用于回答有关书籍和电影脚本的问题。我们发布了模型中的样本数据集。
摘要：A major challenge for scaling machine learning is training models to perform tasks that are very difficult or time-consuming for humans to evaluate. We present progress on this problem on the task of abstractive summarization of entire fiction novels. Our method combines learning from human feedback with recursive task decomposition: we use models trained on smaller parts of the task to assist humans in giving feedback on the broader task. We collect a large volume of demonstrations and comparisons from human labelers, and fine-tune GPT-3 using behavioral cloning and reward modeling to do summarization recursively. At inference time, the model first summarizes small sections of the book and then recursively summarizes these summaries to produce a summary of the entire book. Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves. Our resulting model generates sensible summaries of entire books, even matching the quality of human-written summaries in a few cases ($\sim5\%$ of books). We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization. A zero-shot question-answering model using these summaries achieves state-of-the-art results on the challenging NarrativeQA benchmark for answering questions about books and movie scripts. We release datasets of samples from our model.

【2】 LDC-VAE: A Latent Distribution Consistency Approach to Variational AutoEncoders
标题：LDC-VAE：变分自动编码器的潜在分布一致性方法
链接：https://arxiv.org/abs/2109.10640

作者：Xiaoyu Chen,Chen Gong,Qiang He,Xinwen Hou,Yu Liu
机构：Institution of Automation, Chinese Academy of Science
摘要：变分自动编码器（VAE）作为生成模型的一个重要方面，受到了广泛的研究兴趣，并取得了许多成功的应用。然而，在优化证据下界（ELBO）时，如何实现学习的潜在分布与先验潜在分布的一致性一直是一个难题，最终导致数据生成性能不理想。在本文中，我们提出了一种潜在分布一致性方法，以避免ELBO优化中后验分布和先验分布之间的实质性不一致。我们将我们的方法命名为潜在分布一致性VAE（LDC-VAE）。我们通过假设潜在空间中的真实后验分布为Gibbs形式，并使用我们的编码器对其进行近似来实现这一目的。然而，对于这种Gibbs后验近似，目前还没有解析解，传统的近似方法非常耗时，例如使用基于迭代采样的MCMC。为了解决这个问题，我们使用Stein变分梯度下降（SVGD）来近似Gibbs后验概率。同时，我们使用SVGD训练一个采样器网络，该采样器网络可以从Gibbs后验数据中获得有效的样本。对流行的图像生成数据集的比较研究表明，我们的方法取得了与VAEs的几个强大改进相当甚至更好的性能。
摘要：Variational autoencoders (VAEs), as an important aspect of generative models, have received a lot of research interests and reached many successful applications. However, it is always a challenge to achieve the consistency between the learned latent distribution and the prior latent distribution when optimizing the evidence lower bound (ELBO), and finally leads to an unsatisfactory performance in data generation. In this paper, we propose a latent distribution consistency approach to avoid such substantial inconsistency between the posterior and prior latent distributions in ELBO optimizing. We name our method as latent distribution consistency VAE (LDC-VAE). We achieve this purpose by assuming the real posterior distribution in latent space as a Gibbs form, and approximating it by using our encoder. However, there is no analytical solution for such Gibbs posterior in approximation, and traditional approximation ways are time consuming, such as using the iterative sampling-based MCMC. To address this problem, we use the Stein Variational Gradient Descent (SVGD) to approximate the Gibbs posterior. Meanwhile, we use the SVGD to train a sampler net which can obtain efficient samples from the Gibbs posterior. Comparative studies on the popular image generation datasets show that our method has achieved comparable or even better performance than several powerful improvements of VAEs.

【3】 Fully probabilistic design for knowledge fusion between Bayesian filters under uniform disturbances
标题：均匀扰动下贝叶斯滤波器间知识融合的全概率设计
链接：https://arxiv.org/abs/2109.10596

作者：Lenka Kuklišová Pavelková,Ladislav Jirsa,Anthony Quinn
机构：Czech Academy of Sciences, Institute of Information Theory and Automation, Pod vod´arenskou vˇeˇz´ı , Prague, Czech Republic, Trinity College Dublin, the University of Dublin, Ireland
备注：39 pages
摘要：研究了均匀状态驱动的线性状态空间过程与观测噪声过程之间基于贝叶斯转移学习的知识融合问题。源过滤任务提供的概率状态预测器上的目标任务条件，以改进其自身的状态估计。目标和源的联合模型不是必需的，也不会被引出。通过完全概率设计（FPD），即通过适当最小化Kullback-Leibler散度（KLD），解决了在不完全建模情况下选择最佳条件目标滤波分布的决策问题。由此产生的FPD最佳目标学习者是健壮的，因为它可以拒绝质量较差的源知识。此外，这种贝叶斯转移学习（BTL）方案不依赖于源任务和目标任务之间的交互模型，这一事实确保了对这种模型的错误指定的鲁棒性。后者是影响传统迁移学习方法的一个问题。通过大量的模拟，并与两个当代方案进行比较，证明了所提出的BTL方案的特性。
摘要：This paper considers the problem of Bayesian transfer learning-based knowledge fusion between linear state-space processes driven by uniform state and observation noise processes. The target task conditions on probabilistic state predictor(s) supplied by the source filtering task(s) to improve its own state estimate. A joint model of the target and source(s) is not required and is not elicited. The resulting decision-making problem for choosing the optimal conditional target filtering distribution under incomplete modelling is solved via fully probabilistic design (FPD), i.e. via appropriate minimization of Kullback-Leibler divergence (KLD). The resulting FPD-optimal target learner is robust, in the sense that it can reject poor-quality source knowledge. In addition, the fact that this Bayesian transfer learning (BTL) scheme does not depend on a model of interaction between the source and target tasks ensures robustness to the misspecification of such a model. The latter is a problem that affects conventional transfer learning methods. The properties of the proposed BTL scheme are demonstrated via extensive simulations, and in comparison with two contemporary alternatives.

【4】 An automatic differentiation system for the age of differential privacy
标题：一种区分隐私年龄的自动判别系统
链接：https://arxiv.org/abs/2109.10573

作者：Dmitrii Usynin,Alexander Ziller,Moritz Knolle,Daniel Rueckert,Georgios Kaissis
机构：†Technical University of Munich, §Imperial College London
备注：8 pages
摘要：我们介绍了氚，一种用于差异私有（DP）机器学习（ML）的基于自动差异的灵敏度分析框架。此设置中的最佳噪声校准需要高效的雅可比矩阵计算和L2灵敏度的严格界限。我们的框架通过依赖基于功能分析的灵敏度跟踪方法实现了这些目标，我们简要介绍了该方法。这种方法可以与基于静态图的自动区分自然无缝地进行互操作，与以前的工作相比，它可以在编译时间上实现数量级的改进。此外，我们还证明，与区间限制传播技术相比，一次优化整个计算图的灵敏度会产生更严格的真实灵敏度估计。我们的工作自然符合DP的最新发展，如个人隐私会计，旨在提供改进的隐私效用权衡，并代表着将可访问的机器学习工具与高级隐私会计系统集成的一步。
摘要：We introduce Tritium, an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML). Optimal noise calibration in this setting requires efficient Jacobian matrix computations and tight bounds on the L2-sensitivity. Our framework achieves these objectives by relying on a functional analysis-based method for sensitivity tracking, which we briefly outline. This approach interoperates naturally and seamlessly with static graph-based automatic differentiation, which enables order-of-magnitude improvements in compilation times compared to previous work. Moreover, we demonstrate that optimising the sensitivity of the entire computational graph at once yields substantially tighter estimates of the true sensitivity compared to interval bound propagation techniques. Our work naturally befits recent developments in DP such as individual privacy accounting, aiming to offer improved privacy-utility trade-offs, and represents a step towards the integration of accessible machine learning tooling with advanced privacy accounting systems.

【5】 Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings
标题：索引$t$-SNE：相干嵌入的高维数据集的跟踪动力学
链接：https://arxiv.org/abs/2109.10538

作者：Gaëlle Candel,David Naccache
机构：Wordline TSS Labs, Paris, &, Département d’informatique de l’ENS, ENS, CNRS, PSL University, Paris
备注：None
摘要：$t$-SNE是数据科学界广泛采用的一种嵌入方法。t-SNE的两个有趣特征是结构保持性和拥挤问题的答案，即高维空间中的所有邻居无法在低维空间中正确表示$t$-SNE保留了局部邻域，通过调整局部密度，相似的项目可以很好地间隔。这两个特征产生了一种有意义的表示，其中簇面积与其数量大小成正比，簇之间的关系通过嵌入上的紧密性来具体化。该算法是非参数的，因此对算法进行两次初始化将导致两种不同的嵌入。在法医学方法中，分析员希望使用嵌入的方法比较两个或多个数据集。一种方法是通过使用数据子集构建的嵌入来学习参数化模型。虽然这种方法具有高度的可扩展性，但可以在相同的精确位置映射点，使它们无法区分。这种类型的模型将无法适应新的异常值或概念漂移。本文提出了一种重用嵌入来创建新嵌入的方法，其中保留了簇位置。优化过程最小化了两个代价，一个相对于嵌入形状，另一个相对于支撑嵌入匹配。该算法与原来的$t$-SNE算法具有相同的嵌入新项目的复杂度，并且在考虑嵌入分片数据集时具有更低的复杂度。该方法在真实数据集上显示了有希望的结果，允许观察集群的诞生、演化和死亡。所提出的方法有助于识别重要的趋势和变化，从而能够监控高维数据集的动态。
摘要：$t$-SNE is an embedding method that the data science community has widely Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. $t$-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric, therefore two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. An approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding' match. The proposed algorithm has the same complexity than the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets' dynamics.

【6】 A unified interpretation of the Gaussian mechanism for differential privacy through the sensitivity index
标题：通过灵敏度指数统一解释差异隐私的高斯机制
链接：https://arxiv.org/abs/2109.10528

作者：Georgios Kaissis,Moritz Knolle,Friederike Jungmann,Alexander Ziller,Dmitrii Usynin,Daniel Rueckert
机构：Rueckert, A unified interpretation of the Gaussian, mechanism for differential privacy through the, sensitivity index
备注：Under review at PETS 2022
摘要：高斯机制（GM）是实现差异隐私（DP）的一种普遍使用的工具，大量工作致力于对其进行分析。我们认为，GM的三种主要解释，即$（\varepsilon，\delta）$-DP、f-DP和R\enyi-DP，可以通过使用单一参数$\psi$来表示，我们称之为敏感性指数$\psi$通过封装GM的两个基本量（查询的灵敏度和噪声扰动的大小）来唯一地描述GM及其属性。$\psi$与ROC曲线和DP的假设检验解释密切相关，为从业者提供了解释、比较和传达高斯机制隐私保证的强大方法。
摘要：The Gaussian mechanism (GM) represents a universally employed tool for achieving differential privacy (DP), and a large body of work has been devoted to its analysis. We argue that the three prevailing interpretations of the GM, namely $(\varepsilon, \delta)$-DP, f-DP and R\'enyi DP can be expressed by using a single parameter $\psi$, which we term the sensitivity index. $\psi$ uniquely characterises the GM and its properties by encapsulating its two fundamental quantities: the sensitivity of the query and the magnitude of the noise perturbation. With strong links to the ROC curve and the hypothesis-testing interpretation of DP, $\psi$ offers the practitioner a powerful method for interpreting, comparing and communicating the privacy guarantees of Gaussian mechanisms.

【7】 A Spectral Approach to Off-Policy Evaluation for POMDPs
标题：一种用于POMDP非政策评估的谱方法
链接：https://arxiv.org/abs/2109.10502

作者：Yash Nair,Nan Jiang
机构：Harvard University, University of Illinois Urbana-Champaign
摘要：我们考虑在部分Observable Markov决策过程中的Office策略评估（OPE），其中评估策略仅依赖于可观察变量，但是行为策略依赖于潜在状态（Thannnordz等人（2020A））。先前关于这个问题的工作使用了基于隐藏状态的一步可观测代理的因果识别策略，该策略依赖于某些一步矩矩阵的可逆性。在这项工作中，我们通过使用光谱方法和将一步代理扩展到过去和未来来放宽这一要求。我们将我们的OPE方法与现有方法进行了实证比较，并证明了它们提高了预测精度和更大的通用性。最后，我们推导了一个独立的重要性抽样（IS）算法，该算法依赖于秩、区分度和正性条件，而不是Tennenholtz等人（2020a）要求的关于报酬和隐藏状态结构的可观测轨迹的严格充分性条件。
摘要：We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes, where the evaluation policy depends only on observable variables but the behavior policy depends on latent states (Tennenholtz et al. (2020a)). Prior work on this problem uses a causal identification strategy based on one-step observable proxies of the hidden state, which relies on the invertibility of certain one-step moment matrices. In this work, we relax this requirement by using spectral methods and extending one-step proxies both into the past and future. We empirically compare our OPE methods to existing ones and demonstrate their improved prediction accuracy and greater generality. Lastly, we derive a separate Importance Sampling (IS) algorithm which relies on rank, distinctness, and positivity conditions, and not on the strict sufficiency conditions of observable trajectories with respect to the reward and hidden-state structure required by Tennenholtz et al. (2020a).

【8】 AI in Osteoporosis
标题：人工智能与骨质疏松症
链接：https://arxiv.org/abs/2109.10478

作者：Sokratis Makrogiannis,Keni Zheng
机构：Division of Physics, Engineering, Mathematics and Computer Science, Delaware, State University, Dover, DE ,-
摘要：在本章中，我们探索和评估骨小梁特征和骨质疏松症诊断的方法，并对稀疏近似法越来越感兴趣。我们首先介绍了纹理表示和分类技术、基于面片的方法（如关键点包）和最近的深层神经网络。然后介绍了模式识别中稀疏表示的概念，详细介绍了稀疏分析方法和分类器决策融合方法。我们报告了骨质疏松症骨片数据集的交叉验证结果，并比较了不同类别方法产生的结果。我们的结论是，人工智能和机器学习领域的进步使得可以在临床环境中用作诊断工具的方法得以发展。
摘要：In this chapter we explore and evaluate methods for trabecular bone characterization and osteoporosis diagnosis with increased interest in sparse approximations. We first describe texture representation and classification techniques, patch-based methods such as Bag of Keypoints, and more recent deep neural networks. Then we introduce the concept of sparse representations for pattern recognition and we detail integrative sparse analysis methods and classifier decision fusion methods. We report cross-validation results on osteoporosis datasets of bone radiographs and compare the results produced by the different categories of methods. We conclude that advances in the AI and machine learning fields have enabled the development of methods that can be used as diagnostic tools in clinical settings.

【9】 Achieving Counterfactual Fairness for Causal Bandit
标题：实现因果匪徒的反事实公正性
链接：https://arxiv.org/abs/2109.10458

作者：Wen Huang,Lu Zhang,Xintao Wu
机构：University of Arkansas
摘要：在在线推荐中，客户以顺序和随机的方式从底层分布到达，在线决策模型根据某种策略为每个到达的个人推荐一个选定的项目。我们研究如何在每一步推荐一个项目，以最大限度地实现预期回报，同时实现用户端对客户的公平性，即，共享相似配置文件的客户将获得相似的回报，而不管他们的敏感属性和被推荐的项目如何。通过将因果推理纳入bandits，并采用软干预对arm选择策略进行建模，我们首先提出了基于d-分离的UCB算法（d-UCB），以探索如何利用d-分离集来减少实现低累积后悔所需的探索量。在此基础上，我们提出了公平因果强盗（F-UCB）来实现反事实的个体公平。理论分析和实证评估都证明了算法的有效性。
摘要：In online recommendation, customers arrive in a sequential and stochastic manner from an underlying distribution and the online decision model recommends a chosen item for each arriving individual based on some strategy. We study how to recommend an item at each step to maximize the expected reward while achieving user-side fairness for customers, i.e., customers who share similar profiles will receive a similar reward regardless of their sensitive attributes and items being recommended. By incorporating causal inference into bandits and adopting soft intervention to model the arm selection strategy, we first propose the d-separation based UCB algorithm (D-UCB) to explore the utilization of the d-separation set in reducing the amount of exploration needed to achieve low cumulative regret. Based on that, we then propose the fair causal bandit (F-UCB) for achieving the counterfactual individual fairness. Both theoretical analysis and empirical evaluation demonstrate effectiveness of our algorithms.

【10】 Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles
标题：超越判别模式：论决策规则集成的稳健性
链接：https://arxiv.org/abs/2109.10432

作者：Xin Du,Subramanian Ramamoorthy,Wouter Duivesteijn,Jin Tian,Mykola Pechenizkiy
机构：University of Edinburgh, Eindhoven University of Technology, Iowa State University
摘要：由于所涉及模式的局部性质，通常认为局部决策规则更易于解释。通过梯度推进等数值优化方法，局部决策规则集合可以在涉及全局结构的数据上获得良好的预测性能。与此同时，机器学习模型正越来越多地用于解决医疗和金融等高风险领域的问题。在这方面，对于实践者需要了解这些模型是否以及如何在分布变化的情况下在部署环境中稳健地运行，正在形成共识。过去对局部决策规则的研究主要集中在最大化判别模式，而没有充分考虑对分布变化的鲁棒性。为了填补这一空白，我们提出了一种新的学习和集成局部决策规则的方法，该方法在训练和部署环境中都具有鲁棒性。具体而言，我们建议通过将子种群和部署环境中的分布变化视为对基础系统进行干预的结果来利用因果知识。我们提出了两个基于因果知识的正则化项来搜索最优和稳定的规则。在合成数据集和基准数据集上的实验表明，我们的方法对多种环境中的分布变化是有效和鲁棒的。
摘要：Local decision rules are commonly understood to be more explainable, due to the local nature of the patterns involved. With numerical optimization methods such as gradient boosting, ensembles of local decision rules can gain good predictive performance on data involving global structure. Meanwhile, machine learning models are being increasingly used to solve problems in high-stake domains including healthcare and finance. Here, there is an emerging consensus regarding the need for practitioners to understand whether and how those models could perform robustly in the deployment environments, in the presence of distributional shifts. Past research on local decision rules has focused mainly on maximizing discriminant patterns, without due consideration of robustness against distributional shifts. In order to fill this gap, we propose a new method to learn and ensemble local decision rules, that are robust both in the training and deployment environments. Specifically, we propose to leverage causal knowledge by regarding the distributional shifts in subpopulations and deployment environments as the results of interventions on the underlying system. We propose two regularization terms based on causal knowledge to search for optimal and stable rules. Experiments on both synthetic and benchmark datasets show that our method is effective and robust against distributional shifts in multiple environments.

【11】 RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing
标题：RETRONLU：检索增强的面向任务的语义分析
链接：https://arxiv.org/abs/2109.10410

作者：Vivek Gupta,Akshat Shrivastava,Adithya Sagar,Armen Aghajanyan,Denis Savenkov
机构：School of Computing, University of Utah, Facebook Conversational AI, Menlo Park
备注：12 pages, 9 figures, 5 Tables
摘要：虽然大型预先训练的语言模型在其参数中积累了大量知识，但已经证明，使用基于非参数检索的内存对其进行扩充，可以提高以知识为中心的任务（如问答）的准确性和数据效率。本文将基于检索的建模思想应用于面向多领域任务的会话助手语义分析问题。我们的方法RetroNLU使用检索组件扩展了序列到序列模型体系结构，用于获取现有的类似示例，并将其作为模型的附加输入。特别地，我们分析了两种设置，其中我们使用（a）检索到的最近邻话语（话语nn）和（b）最近邻话语的地面真值语义分析（semparse nn）来增加输入。我们的技术在1.5%的绝对macro-F1上优于基线方法，尤其是在低资源设置下，与基线模型精度匹配的数据仅为40%。此外，我们还分析了最近邻检索组件的质量、模型敏感性，并对不同话语复杂度的语义分析的性能进行了分解。
摘要：While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering. In this paper, we are applying retrieval-based modeling ideas to the problem of multi-domain task-oriented semantic parsing for conversational assistants. Our approach, RetroNLU, extends a sequence-to-sequence model architecture with a retrieval component, used to fetch existing similar examples and provide them as an additional input to the model. In particular, we analyze two settings, where we augment an input with (a) retrieved nearest neighbor utterances (utterance-nn), and (b) ground-truth semantic parses of nearest neighbor utterances (semparse-nn). Our technique outperforms the baseline method by 1.5% absolute macro-F1, especially at the low resource setting, matching the baseline model accuracy with only 40% of the data. Furthermore, we analyze the nearest neighbor retrieval component's quality, model sensitivity and break down the performance for semantic parses of different utterance complexity.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递