机器学习学术速递[9.17]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计101篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 Weighted Graph-Based Signal Temporal Logic Inference Using Neural Networks
标题：基于神经网络的加权图信号时态逻辑推理
链接：https://arxiv.org/abs/2109.08078

作者：Nasim Baharisangari,Kazuma Hirota,Ruixuan Yan,Agung Julius,Zhe Xu
机构：TheUniversityofTexasatAustin, 3Ruixuan Yan and Agung Julius are with theDepartment of Electrical
备注：6 pages, 1 figure, 1 table
摘要：从数据中提取时空知识在许多应用中都很有用。重要的是，所获得的知识是人类可解释的，并且易于进行形式分析。在本文中，我们提出了一种训练神经网络以基于加权图的信号时序逻辑（wGSTL）公式的形式学习时空特性的方法。为了学习wGSTL公式，我们引入了一种灵活的wGSTL公式结构，其中用户的偏好可以应用于推断的wGSTL公式。在该框架中，神经网络的每个神经元对应于灵活的wGSTL公式结构中的一个子公式。我们首先训练一个神经网络来学习wGSTL算子，然后训练第二个神经网络来学习灵活的wGSTL公式结构中的参数。我们使用一个新冠病毒-19数据集和一个降雨预测数据集来评估所提出的框架和算法的性能。我们比较了该框架与三种基线分类方法的性能，包括K近邻、决策树和人工神经网络。该框架的分类精度与基线分类方法相当。
摘要：Extracting spatial-temporal knowledge from data is useful in many applications. It is important that the obtained knowledge is human-interpretable and amenable to formal analysis. In this paper, we propose a method that trains neural networks to learn spatial-temporal properties in the form of weighted graph-based signal temporal logic (wGSTL) formulas. For learning wGSTL formulas, we introduce a flexible wGSTL formula structure in which the user's preference can be applied in the inferred wGSTL formulas. In the proposed framework, each neuron of the neural networks corresponds to a subformula in a flexible wGSTL formula structure. We initially train a neural network to learn the wGSTL operators and then train a second neural network to learn the parameters in a flexible wGSTL formula structure. We use a COVID-19 dataset and a rain prediction dataset to evaluate the performance of the proposed framework and algorithms. We compare the performance of the proposed framework with three baseline classification methods including K-nearest neighbors, decision trees, and artificial neural networks. The classification accuracy obtained by the proposed framework is comparable with the baseline classification methods.

【2】 Accurately Modeling Biased Random Walks on Weighted Graphs Using $\textit{Node2vec+}$
标题：使用$extit{Node2vec+}$精确建模加权图上的有偏随机游动
链接：https://arxiv.org/abs/2109.08031

作者：Renming Liu,Matthew Hirn,Arjun Krishnan
机构：Department of Computational Mathematics, Science & Engineering,Department of Mathematics,Center for Quantum, Computing, Science & Engineering,Department of Biochemistry and Molecular Biology, Michigan State University, East, Lansing, MI , USA
摘要：节点嵌入是表示图中每个节点的结构角色的强大方法$\textit{Node2vec}$是一种广泛使用的节点嵌入方法，通过图上的有偏随机游动探索局部邻域。然而，$\TexTy{{NoDE2VEC } $在计算行走偏差时不考虑边缘权重。这种固有的限制阻止了$\textit{node2vec}$利用加权图中的所有信息，从而限制了它在许多加权密集的真实网络中的应用。在这里，我们自然地将$\textit{node2vec}$扩展到$\textit{node2vec+}$，在计算行走偏差时考虑边权重，但在无权重图或无偏行走的情况下，这将减少到$\textit{node2vec}$。我们通过经验证明，在使用两个合成数据集的加权图中，$\textit{node2vec+}$比$\textit{node2vec}$对加性噪声更鲁棒。我们还证明了$\textit{node2vec+}$在一个常用的基准多标签数据集（Wikipedia）上显著优于$\textit{node2vec}$。此外，我们在两个蛋白质-蛋白质相互作用网络上使用各种具有挑战性的基因分类任务，对GCN和GraphSAGE测试$\textit{node2vec+}$。尽管GCN和GraphSAGE有一些明显的优势，但它们的性能与$\textit{node2vec+}$相当。最后，$\textit{node2vec+}$可以用作生成有偏随机游动的一般方法，这有利于在$\textit{node2vec}$之上构建的所有现有方法$\textit{Node2vec+}$是作为$\texttt{PecanPy}$的一部分实现的，可在https://github.com/krishnanlab/PecanPy .
摘要：Node embedding is a powerful approach for representing the structural role of each node in a graph. $\textit{Node2vec}$ is a widely used method for node embedding that works by exploring the local neighborhoods via biased random walks on the graph. However, $\textit{node2vec}$ does not consider edge weights when computing walk biases. This intrinsic limitation prevents $\textit{node2vec}$ from leveraging all the information in weighted graphs and, in turn, limits its application to many real-world networks that are weighted and dense. Here, we naturally extend $\textit{node2vec}$ to $\textit{node2vec+}$ in a way that accounts for edge weights when calculating walk biases, but which reduces to $\textit{node2vec}$ in the cases of unweighted graphs or unbiased walks. We empirically show that $\textit{node2vec+}$ is more robust to additive noise than $\textit{node2vec}$ in weighted graphs using two synthetic datasets. We also demonstrate that $\textit{node2vec+}$ significantly outperforms $\textit{node2vec}$ on a commonly benchmarked multi-label dataset (Wikipedia). Furthermore, we test $\textit{node2vec+}$ against GCN and GraphSAGE using various challenging gene classification tasks on two protein-protein interaction networks. Despite some clear advantages of GCN and GraphSAGE, they show comparable performance with $\textit{node2vec+}$. Finally, $\textit{node2vec+}$ can be used as a general approach for generating biased random walks, benefiting all existing methods built on top of $\textit{node2vec}$. $\textit{Node2vec+}$ is implemented as part of $\texttt{PecanPy}$, which is available at https://github.com/krishnanlab/PecanPy .

【3】 Efficient Scaling of Dynamic Graph Neural Networks
标题：动态图神经网络的有效缩放
链接：https://arxiv.org/abs/2109.07893

作者：Venkatesan T. Chakaravarthy,Shivmaran S. Pandian,Saurabh Raje,Yogish Sabharwal,Toyotaro Suzumura,Shashanka Ubaru
机构： IBM Research, India, IBM T.J. Watson Research Center
备注：Conference version to appear in the proceedings of SC'21
摘要：我们提出了在跨多节点、多GPU系统的大规模图上训练动态图神经网络（GNN）的分布式算法。据我们所知，这是第一次对动态GNN进行缩放研究。我们设计了减少GPU内存使用的机制，并确定了两个执行时间瓶颈：CPU-GPU数据传输；和通讯量。利用动态图的特性，我们设计了一种基于图差分的策略来显著减少传输时间。我们开发了一种简单但有效的数据分发技术，在该技术下，对于任意数量的GPU，通信量在输入大小上保持固定和线性。我们在128GPU系统上使用十亿大小的图形进行的实验表明：（i）该分配方案在128GPU上实现了高达30倍的加速比；（ii）图形差分技术将传输时间减少了4.1倍，总执行时间减少了40%
摘要：We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise mechanisms for reducing the GPU memory usage and identify two execution time bottlenecks: CPU-GPU data transfer; and communication volume. Exploiting properties of dynamic graphs, we design a graph difference-based strategy to significantly reduce the transfer time. We develop a simple, but effective data distribution technique under which the communication volume remains fixed and linear in the input size, for any number of GPUs. Our experiments using billion-size graphs on a system of 128 GPUs shows that: (i) the distribution scheme achieves up to 30x speedup on 128 GPUs; (ii) the graph-difference technique reduces the transfer time by a factor of up to 4.1x and the overall execution time by up to 40%

【4】 SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving
标题：Spin Road Mapper：基于空间和交互空间图推理的自动驾驶航拍道路提取
链接：https://arxiv.org/abs/2109.07701

作者：Wele Gedara Chaminda Bandara,Jeya Maria Jose Valanarasu,Vishal M. Patel
机构：Authors are with the Department of Electrical and Computer Engi-neering, The Johns Hopkins University
备注：None
摘要：道路提取是构建自主导航系统的关键步骤。检测路段具有挑战性，因为它们具有不同的宽度，在整个图像中分叉，并且经常被地形、云层或其他天气条件遮挡。仅使用卷积神经网络（ConvNets）解决此问题并不有效，因为它在捕获图像中道路段之间的距离依赖关系方面效率低下，这对于提取道路连通性至关重要。为此，我们提出了一个空间和交互空间图形推理（SPIN）模块，当插入ConvNet时，该模块对在从特征地图投影的空间和交互空间上构建的图形执行推理。空间推理提取不同空间区域和其他上下文信息之间的依赖关系。对投影交互空间的推理有助于从图像中显示的其他地形图中恰当地描绘道路。因此，SPIN提取路段之间的长期依赖关系，并从其他语义中有效地描绘道路。我们还介绍了一个自旋金字塔，它在多个尺度上执行自旋图推理，以提取多尺度特征。我们提出了一种基于堆叠沙漏模块和旋转金字塔的道路分割网络，与现有方法相比，该网络具有更好的性能。此外，该方法计算效率高，在训练过程中显著提高了收敛速度，使其适用于大规模高分辨率航空图像。代码可从以下网址获得：https://github.com/wgcban/SPIN_RoadMapper.git.
摘要：Road extraction is an essential step in building autonomous navigation systems. Detecting road segments is challenging as they are of varying widths, bifurcated throughout the image, and are often occluded by terrain, cloud, or other weather conditions. Using just convolution neural networks (ConvNets) for this problem is not effective as it is inefficient at capturing distant dependencies between road segments in the image which is essential to extract road connectivity. To this end, we propose a Spatial and Interaction Space Graph Reasoning (SPIN) module which when plugged into a ConvNet performs reasoning over graphs constructed on spatial and interaction spaces projected from the feature maps. Reasoning over spatial space extracts dependencies between different spatial regions and other contextual information. Reasoning over a projected interaction space helps in appropriate delineation of roads from other topographies present in the image. Thus, SPIN extracts long-range dependencies between road segments and effectively delineates roads from other semantics. We also introduce a SPIN pyramid which performs SPIN graph reasoning across multiple scales to extract multi-scale features. We propose a network based on stacked hourglass modules and SPIN pyramid for road segmentation which achieves better performance compared to existing methods. Moreover, our method is computationally efficient and significantly boosts the convergence speed during training, making it feasible for applying on large-scale high-resolution aerial images. Code available at: https://github.com/wgcban/SPIN_RoadMapper.git.

【5】 Comparing Euclidean and Hyperbolic Embeddings on the WordNet Nouns Hypernymy Graph
标题：WordNet名词Hypernymy图上欧几里得和双曲嵌入的比较
链接：https://arxiv.org/abs/2109.07488

作者：Sameer Bansal,Adrian Benton
机构：Bloomberg, Lexington Ave, New York, NY , USA
摘要：Nickel和Kiela（2017）提出了一种在庞加莱球中嵌入树节点的新方法，并指出这些双曲型嵌入在嵌入大型层次结构图（如WordNet名词超名树）节点时远比欧几里德嵌入有效。在低维情况下尤其如此（Nickel和Kiela，2017年，表1）。在这项工作中，我们试图重现他们嵌入和重建WordNet名词超义图的实验。与他们的报告相反，我们发现欧几里德嵌入能够表示这棵树，当至少允许50维时，欧几里德嵌入至少能够表示庞加莱嵌入。我们注意到，鉴于双曲嵌入在极低维环境中令人印象深刻的性能，这并没有削弱他们工作的重要性。然而，考虑到他们工作的广泛影响，我们的目标是在欧几里德嵌入和双曲嵌入之间进行更新和更精确的比较。
摘要：Nickel and Kiela (2017) present a new method for embedding tree nodes in the Poincare ball, and suggest that these hyperbolic embeddings are far more effective than Euclidean embeddings at embedding nodes in large, hierarchically structured graphs like the WordNet nouns hypernymy tree. This is especially true in low dimensions (Nickel and Kiela, 2017, Table 1). In this work, we seek to reproduce their experiments on embedding and reconstructing the WordNet nouns hypernymy graph. Counter to what they report, we find that Euclidean embeddings are able to represent this tree at least as well as Poincare embeddings, when allowed at least 50 dimensions. We note that this does not diminish the significance of their work given the impressive performance of hyperbolic embeddings in very low-dimensional settings. However, given the wide influence of their work, our aim here is to present an updated and more accurate comparison between the Euclidean and hyperbolic embeddings.

【6】 RaWaNet: Enriching Graph Neural Network Input via Random Walks on Graphs
标题：RaWaNet：利用图上随机游走丰富图神经网络输入
链接：https://arxiv.org/abs/2109.07555

作者：Anahita Iravanizad,Edgar Ivan Sanchez Medina,Martin Stoll
摘要：近年来，图形神经网络（GNN）越来越受欢迎，对于以图形表示的数据显示了非常有希望的结果。大多数GNN架构都是基于开发新的卷积和/或池层而设计的，这些层可以更好地提取用于不同预测任务的图的隐藏和深层表示。这些层的输入主要是图形的三个默认描述符：节点特征$（X）$、邻接矩阵$（a）$和边特征$（W）$（如果可用）。为了给网络提供更丰富的输入，我们提出了一种基于三个选定长度的图的随机游走数据处理方法。即，长度为1和2的（规则）行走，以及长度为$\gamma\in（0,1）$的分数行走，以捕捉图上不同的局部和全局动态。我们还计算每个随机游动的平稳分布，然后将其用作初始节点特征的比例因子（$X$）。这样，对于每个图，网络接收多个邻接矩阵及其对节点特征的单独权重。我们通过将处理后的节点特征传递给网络，在各种分子数据集上测试我们的方法，以便执行一些分类和回归任务。有趣的是，我们的方法没有使用在分子图学习中被大量利用的边缘特征，使浅层网络优于众所周知的深层GNN。
摘要：In recent years, graph neural networks (GNNs) have gained increasing popularity and have shown very promising results for data that are represented by graphs. The majority of GNN architectures are designed based on developing new convolutional and/or pooling layers that better extract the hidden and deeper representations of the graphs to be used for different prediction tasks. The inputs to these layers are mainly the three default descriptors of a graph, node features $(X)$, adjacency matrix $(A)$, and edge features $(W)$ (if available). To provide a more enriched input to the network, we propose a random walk data processing of the graphs based on three selected lengths. Namely, (regular) walks of length 1 and 2, and a fractional walk of length $\gamma \in (0,1)$, in order to capture the different local and global dynamics on the graphs. We also calculate the stationary distribution of each random walk, which is then used as a scaling factor for the initial node features ($X$). This way, for each graph, the network receives multiple adjacency matrices along with their individual weighting for the node features. We test our method on various molecular datasets by passing the processed node features to the network in order to perform several classification and regression tasks. Interestingly, our method, not using edge features which are heavily exploited in molecular graph learning, let a shallow network outperform well known deep GNNs.

Transformer(1篇)

【1】 An End-to-End Transformer Model for 3D Object Detection
标题：一种用于三维目标检测的端到端Transformer模型
链接：https://arxiv.org/abs/2109.08141

作者：Ishan Misra,Rohit Girdhar,Armand Joulin
机构：Facebook AI Research
备注：Accepted at ICCV 2021
摘要：我们提出了3DETR，一种基于端到端转换器的三维点云目标检测模型。与使用大量3D特定感应偏压的现有检测方法相比，3DETR只需对普通Transformer块进行最小修改。具体而言，我们发现，具有非参数查询和傅里叶位置嵌入的标准转换器与使用具有手动调整超参数的三维特定运算符库的专用体系结构具有竞争力。尽管如此，3DETR在概念上简单且易于实现，通过结合3D领域知识实现了进一步的改进。通过大量的实验，我们发现3DETR在具有挑战性的ScanNetV2数据集上的性能比完善且高度优化的VoteNet基线高9.5%。此外，我们还表明3DETR适用于检测不到的3D任务，可以作为未来研究的基础。
摘要：We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.

GAN|对抗|攻击|生成相关(5篇)

【1】 Zero-Shot Open Information Extraction using Question Generation and Reading Comprehension
标题：基于问题生成和阅读理解的零命中率开放信息抽取
链接：https://arxiv.org/abs/2109.08079

作者：Himanshu Gupta,Amogh Badugu,Tamanna Agrawal,Himanshu Sharad Bhatt
机构：American Express, AI Labs, Bangalore, India, Himanshu S. Bhatt
备注：8 pages, 2 Figures, 1 Algorithm, 7 Tables. Accepted in KDD Workshop on Machine Learning in Finance 2021
摘要：通常，开放信息提取（OpenIE）侧重于提取表示主题、关系和关系对象的三元组。然而，大多数现有技术都是基于每个域中预定义的一组关系，这将它们的适用性限制在新的域中，这些域中的关系可能是未知的，例如金融文档。本文提出了一种Zero-Shot开放信息提取技术，该技术利用现成的机器阅读理解（MRC）模型从句子中提取实体（值）及其描述（键）。该模型的输入问题是使用一种新的名词短语生成方法生成的。这种方法考虑了句子的上下文，可以产生各种各样的问题，使我们的技术领域独立。给定问题和句子，我们的技术使用MRC模型来提取实体（值）。与问题相对应的具有最高置信度的名词短语作为描述（键）。本文还介绍了EDGAR10-Q数据集，该数据集基于美国证券交易委员会（SEC）上市公司的公开财务文件。该数据集由段落、标记值（实体）及其键（描述）组成，是实体提取数据集中最大的数据集之一。该数据集将成为研究界的宝贵补充，尤其是在金融领域。最后，本文在EDGAR10-Q和Ade语料库药物剂量数据集上证明了该技术的有效性，其准确率分别为86.84%和97%。
摘要：Typically, Open Information Extraction (OpenIE) focuses on extracting triples, representing a subject, a relation, and the object of the relation. However, most of the existing techniques are based on a predefined set of relations in each domain which limits their applicability to newer domains where these relations may be unknown such as financial documents. This paper presents a zero-shot open information extraction technique that extracts the entities (value) and their descriptions (key) from a sentence, using off the shelf machine reading comprehension (MRC) Model. The input questions to this model are created using a novel noun phrase generation method. This method takes the context of the sentence into account and can create a wide variety of questions making our technique domain independent. Given the questions and the sentence, our technique uses the MRC model to extract entities (value). The noun phrase corresponding to the question, with the highest confidence, is taken as the description (key). This paper also introduces the EDGAR10-Q dataset which is based on publicly available financial documents from corporations listed in US securities and exchange commission (SEC). The dataset consists of paragraphs, tagged values (entities), and their keys (descriptions) and is one of the largest among entity extraction datasets. This dataset will be a valuable addition to the research community, especially in the financial domain. Finally, the paper demonstrates the efficacy of the proposed technique on the EDGAR10-Q and Ade corpus drug dosage datasets, where it obtained 86.84 % and 97% accuracy, respectively.

【2】 Membership Inference Attacks Against Recommender Systems
标题：针对推荐系统的成员推理攻击
链接：https://arxiv.org/abs/2109.08045

作者：Minxing Zhang,Zhaochun Ren,Zihan Wang,Pengjie Ren,Zhumin Chen,Pengfei Hu,Yang Zhang
机构：Shandong University, CISPA Helmholtz Center for Information Security
摘要：近年来，推荐系统取得了良好的性能，成为应用最广泛的web应用之一。然而，推荐系统通常针对高度敏感的用户数据进行训练，因此，推荐系统中潜在的数据泄漏可能会导致严重的隐私问题。在本文中，我们首次尝试通过成员推理的角度来量化推荐系统的隐私泄漏。与针对机器学习分类器的传统隶属度推理相比，我们的攻击面临两个主要区别。首先，我们的攻击是在用户级别，而不是在数据样本级别。其次，对手只能从推荐系统中观察到有序的推荐项目，而不能以后验概率的形式观察预测结果。为了解决上述挑战，我们提出了一种新的方法，从相关项目中表示用户。此外，还建立了一个阴影推荐器，用于导出用于训练攻击模型的标记训练数据。大量的实验结果表明，我们的攻击框架具有很强的性能。此外，我们还设计了一种防御机制来有效地缓解推荐系统的成员推理威胁。
摘要：Recently, recommender systems have achieved promising performances and become one of the most widely used web applications. However, recommender systems are often trained on highly sensitive user data, thus potential data leakage from recommender systems may lead to severe privacy problems. In this paper, we make the first attempt on quantifying the privacy leakage of recommender systems through the lens of membership inference. In contrast with traditional membership inference against machine learning classifiers, our attack faces two main differences. First, our attack is on the user-level but not on the data sample-level. Second, the adversary can only observe the ordered recommended items from a recommender system instead of prediction results in the form of posterior probabilities. To address the above challenges, we propose a novel method by representing users from relevant items. Moreover, a shadow recommender is established to derive the labeled training data for training the attack model. Extensive experimental results show that our attack framework achieves a strong performance. In addition, we design a defense mechanism to effectively mitigate the membership inference threat of recommender systems.

【3】 KnowMAN: Weakly Supervised Multinomial Adversarial Networks
标题：KnowMAN：弱监督多项式对抗网络
链接：https://arxiv.org/abs/2109.07994

作者：Luisa März,Ehsaneddin Asgari,Fabienne Braune,Franziska Zimmermann,Benjamin Roth
机构：⋄ Digital Philology, Research Group Data Mining and Machine Learning, University of Vienna, Austria, † NLP Expert Center, Data:Lab, Volkswagen AG, Munich, Germany
备注：9 pages, 3 figures, 2 tables, accepted to EMNLP 2021
摘要：训练神经模型时缺少标记数据的问题通常通过利用有关特定任务的知识来解决，从而产生启发式但有噪声的标记。知识被捕获到标签函数中，标签函数检测训练样本中的某些规则或模式，并注释相应的标签进行训练。这种弱监督训练过程可能会导致过度依赖由标记函数捕获的信号，并阻碍模型利用其他信号或很好地推广。我们提出KnowMAN，这是一种对抗性方案，能够控制与特定标记功能相关的信号的影响。KnowMAN强制网络学习对这些信号不变的表示，并拾取与输出标签更普遍相关的其他信号。与使用预先训练的transformer语言模型和基于特征的基线的直接弱监督学习相比，KnowMAN极大地改善了结果。
摘要：The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reliance on the signals captured by the labeling functions and hinder models to exploit other signals or to generalize well. We propose KnowMAN, an adversarial scheme that enables to control influence of signals associated with specific labeling functions. KnowMAN forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label. KnowMAN strongly improves results compared to direct weakly supervised learning with a pre-trained transformer language model and a feature-based baseline.

【4】 Targeted Attack on Deep RL-based Autonomous Driving with Learned Visual Patterns
标题：基于学习视觉模式的深度RL自主驾驶目标攻击
链接：https://arxiv.org/abs/2109.07723

作者：Prasanth Buddareddygari,Travis Zhang,Yezhou Yang,Yi Ren
机构： Cornell University
备注：7 pages, 4 figures
摘要：最近的研究表明，通过深度强化学习学习获得的控制策略在对抗性攻击方面存在漏洞，这引起了人们对将此类模型应用于风险敏感任务（如自动驾驶）的关注。这些演示的威胁模型仅限于（1）通过实时操纵代理的观察进行有针对性的攻击，以及（2）通过操纵物理环境进行无针对性的攻击。前者假设在任何时候都可以完全访问代理的状态/观察结果，而后者则无法控制攻击结果。本文通过在环境中的物理对象上放置视觉学习模式来研究目标攻击的可行性，这是一种结合了现有模式的实用性和有效性的威胁模型。通过分析，我们证明了当存在敌对对象时，预先训练的策略可以在一个时间窗口内被劫持，例如执行非预期的自动停车。为了启用攻击，我们假设攻击者可以了解环境和代理的动态。最后，我们通过实验证明了所提出的攻击在不同驾驶场景下的有效性，进行了位置鲁棒性测试，并研究了攻击强度与其有效性之间的权衡。
摘要：Recent studies demonstrated the vulnerability of control policies learned through deep reinforcement learning against adversarial attacks, raising concerns about the application of such models to risk-sensitive tasks such as autonomous driving. Threat models for these demonstrations are limited to (1) targeted attacks through real-time manipulation of the agent's observation, and (2) untargeted attacks through manipulation of the physical environment. The former assumes full access to the agent's states/observations at all times, while the latter has no control over attack outcomes. This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical object in the environment, a threat model that combines the practicality and effectiveness of the existing ones. Through analysis, we demonstrate that a pre-trained policy can be hijacked within a time window, e.g., performing an unintended self-parking, when an adversarial object is present. To enable the attack, we adopt an assumption that the dynamics of both the environment and the agent can be learned by the attacker. Lastly, we empirically show the effectiveness of the proposed attack on different driving scenarios, perform a location robustness test, and study the tradeoff between the attack strength and its effectiveness.

【5】 Adversarial Attacks against Deep Learning Based Power Control in Wireless Communications
标题：无线通信中基于深度学习的功率控制对抗性攻击
链接：https://arxiv.org/abs/2109.08139

作者：Brian Kim,Yi Shi,Yalin E. Sagduyu,Tugba Erpek,Sennur Ulukus
机构：Department of Electrical and Computer Engineering, University of Maryland, College Park, MD , USA, Intelligent Automation, Inc., Rockville, MD , USA
摘要：我们考虑对抗机器学习的功率分配攻击，其中基站（BS）通过使用深度神经网络（DNN）为多个用户设备（UE）服务，将其发射功率分配给多个正交子载波。对应于回归模型的DNN以信道增益作为输入，分配的发射功率作为输出进行训练。当BS将发射功率分配给ue以最大化所有ue的速率时，存在一个旨在最小化这些速率的对手。对手可以是外部发射机，其目的是通过干扰为测量信道增益而发送的导频信号来操纵到DNN的输入。或者，对手可以是向BS发送伪造的信道估计的流氓UE。在这两种情况下，敌方都会仔细设计敌方干扰，以操纵BS DNN的输入，并使这些干扰的强度达到上限。我们认为攻击针对单一的UE或所有UE。我们将这些攻击与基准测试进行比较，在基准测试中，敌方会降低DNN的输入。我们表明，在降低通信速率方面，对抗式攻击比基准攻击更有效。我们还表明，对抗性攻击对对手的不确定性具有鲁棒性，包括对信道增益的错误认识以及完全按照规定实施攻击时的潜在错误。
摘要：We consider adversarial machine learning based attacks on power allocation where the base station (BS) allocates its transmit power to multiple orthogonal subcarriers by using a deep neural network (DNN) to serve multiple user equipments (UEs). The DNN that corresponds to a regression model is trained with channel gains as the input and allocated transmit powers as the output. While the BS allocates the transmit power to the UEs to maximize rates for all UEs, there is an adversary that aims to minimize these rates. The adversary may be an external transmitter that aims to manipulate the inputs to the DNN by interfering with the pilot signals that are transmitted to measure the channel gain. Alternatively, the adversary may be a rogue UE that transmits fabricated channel estimates to the BS. In both cases, the adversary carefully crafts adversarial perturbations to manipulate the inputs to the DNN of the BS subject to an upper bound on the strengths of these perturbations. We consider the attacks targeted on a single UE or all UEs. We compare these attacks with a benchmark, where the adversary scales down the input to the DNN. We show that adversarial attacks are much more effective than the benchmark attack in terms of reducing the rate of communications. We also show that adversarial attacks are robust to the uncertainty at the adversary including the erroneous knowledge of channel gains and the potential errors in exercising the attacks exactly as specified.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】 Semi-Supervised Visual Representation Learning for Fashion Compatibility
标题：服装相容性的半监督视觉表征学习
链接：https://arxiv.org/abs/2109.08052

作者：Ambareesh Revanur,Vijay Kumar,Deepthi Sharma
机构：Carnegie Mellon University, USA, Walmart Global Tech Bangalore, India
备注：ACM RecSys'21 (9 pages) DOI: this https URL
摘要：我们考虑互补时尚预测的问题。现有的方法侧重于学习一个嵌入空间，在这个空间中，来自不同类别、视觉上兼容的时装项目彼此更接近。然而，创建这样的标签服装是密集的，而且不可能生成所有可能的服装组合，尤其是大型时装目录。在这项工作中，我们提出了一种半监督学习方法，利用大量未标记的时尚语料库，在训练过程中动态创建伪正面和伪负面服装。对于训练批次中的每个标记装备，我们通过将标记装备中的每个项目与未标记的项目匹配来获得伪装备。此外，我们还引入了一致性正则化，以确保原始图像及其变换的表示是一致的，从而通过自我监督隐式地包含颜色和其他重要属性。我们在Polyvore、Polyvore-D和我们新创建的大规模时装数据集上进行了广泛的实验，结果表明，我们的方法仅使用一小部分标记的示例，其性能与完全监督的方法相当。
摘要：We consider the problem of complementary fashion prediction. Existing approaches focus on learning an embedding space where fashion items from different categories that are visually compatible are closer to each other. However, creating such labeled outfits is intensive and also not feasible to generate all possible outfit combinations, especially with large fashion catalogs. In this work, we propose a semi-supervised learning approach where we leverage large unlabeled fashion corpus to create pseudo-positive and pseudo-negative outfits on the fly during training. For each labeled outfit in a training batch, we obtain a pseudo-outfit by matching each item in the labeled outfit with unlabeled items. Additionally, we introduce consistency regularization to ensure that representation of the original images and their transformations are consistent to implicitly incorporate colour and other important attributes through self-supervision. We conduct extensive experiments on Polyvore, Polyvore-D and our newly created large-scale Fashion Outfits datasets, and show that our approach with only a fraction of labeled examples performs on-par with completely supervised methods.

【2】 Self-supervised Contrastive Learning for EEG-based Sleep Staging
标题：基于脑电睡眠分期的自监督对比学习
链接：https://arxiv.org/abs/2109.07839

作者：Xue Jiang,Jianhui Zhao,Bo Du,Zhiyong Yuan
机构：Wuhan University, Wuhan, China
备注：IJCNN 2021
摘要：脑电信号通常很容易获得，但标记起来很昂贵。尽管监督学习在脑电信号分析领域得到了广泛的应用，但其泛化性能受到标注数据量的限制。自监督学习（SSL）作为计算机视觉（CV）和自然语言处理（NLP）中一种流行的学习范式，可以利用未标记的数据来弥补监督学习的数据不足。本文提出了一种用于睡眠阶段分类的脑电信号自监督对比学习方法。在训练过程中，我们为网络设置了一个借口任务，以匹配由脑电信号生成的正确变换对。这样，网络通过学习脑电信号的一般特征来提高表征能力。在处理各种数据时，网络的鲁棒性也得到了提高，即从变化的数据中提取常量特征。具体而言，网络的性能取决于变换的选择和自监督学习训练过程中使用的未标记数据量。对睡眠edf数据集的实证评估证明了我们的方法在睡眠分期方面的竞争性能（88.16%的准确率和81.96%的F1分数），并验证了SSL策略在有限标记数据区域中用于EEG信号分析的有效性。所有代码均在网上公开提供。
摘要：EEG signals are usually simple to obtain but expensive to label. Although supervised learning has been widely used in the field of EEG signal analysis, its generalization performance is limited by the amount of annotated data. Self-supervised learning (SSL), as a popular learning paradigm in computer vision (CV) and natural language processing (NLP), can employ unlabeled data to make up for the data shortage of supervised learning. In this paper, we propose a self-supervised contrastive learning method of EEG signals for sleep stage classification. During the training process, we set up a pretext task for the network in order to match the right transformation pairs generated from EEG signals. In this way, the network improves the representation ability by learning the general features of EEG signals. The robustness of the network also gets improved in dealing with diverse data, that is, extracting constant features from changing data. In detail, the network's performance depends on the choice of transformations and the amount of unlabeled data used in the training process of self-supervised learning. Empirical evaluations on the Sleep-edf dataset demonstrate the competitive performance of our method on sleep staging (88.16% accuracy and 81.96% F1 score) and verify the effectiveness of SSL strategy for EEG signal analysis in limited labeled data regimes. All codes are provided publicly online.

【3】 Machine-Learned HASDM Model with Uncertainty Quantification
标题：具有不确定性量化的机器学习HASDM模型
链接：https://arxiv.org/abs/2109.07651

作者：Richard J. Licata,Piyush M. Mehta,W. Kent Tobiska,S. Huzurbazar
机构：Department of Mechanical and Aerospace Engineering, West Virginia University, Morgantown, West Virginia, USA., Space Environment Technologies, Pacific Palisades, California, USA.
摘要：基于SET HASDM密度数据库，建立了第一个具有稳健可靠不确定性估计的热层中性质量密度模型。该数据库由空间环境技术（SET）创建，包含美国空间部队高精度卫星阻力模型（HASDM）20年的输出，该模型代表了密度和阻力建模的最新技术。我们利用主成分分析（PCA）进行降维，创建用于训练非线性机器学习（ML）回归模型的系数。这些模型使用三种独特的损失函数：均方误差（MSE）、预测密度负对数（NLPD）和连续排序概率分数（CRPS）。三个输入集也进行了测试，当引入地磁指数的时间历程时，显示出改进的性能。这些模型利用蒙特卡罗（MC）差来提供不确定性估计，并且使用NLPD损失函数可以在不牺牲模型精度（<10%平均绝对误差）的情况下得到校准良好的不确定性估计。通过将最佳HASDM-ML模型与卫星轨道上的HASDM数据库进行比较，我们发现该模型在所有空间天气条件下提供了密度空间中稳健可靠的不确定性。风暴时间比较表明，HASDM-ML还提供了极端事件期间有意义的不确定性测量。
摘要：The first thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the SET HASDM density database. This database, created by Space Environment Technologies (SET), contains 20 years of outputs from the U.S. Space Force's High Accuracy Satellite Drag Model (HASDM), which represents the state-of-the-art for density and drag modeling. We utilize principal component analysis (PCA) for dimensionality reduction, creating the coefficients upon which nonlinear machine-learned (ML) regression models are trained. These models use three unique loss functions: mean square error (MSE), negative logarithm of predictive density (NLPD), and continuous ranked probability score (CRPS). Three input sets are also tested, showing improved performance when introducing time histories for geomagnetic indices. These models leverage Monte Carlo (MC) dropout to provide uncertainty estimates, and the use of the NLPD loss function results in well-calibrated uncertainty estimates without sacrificing model accuracy (<10% mean absolute error). By comparing the best HASDM-ML model to the HASDM database along satellite orbits, we found that the model provides robust and reliable uncertainties in the density space over all space weather conditions. A storm-time comparison shows that HASDM-ML also supplies meaningful uncertainty measurements during extreme events.

【4】 Modern Cybersecurity Solution using Supervised Machine Learning
标题：基于有监督机器学习的现代网络安全解决方案
链接：https://arxiv.org/abs/2109.07593

作者：Mustafa Sakhai,Maciej Wielgosz
机构：AGH University of Science and Technology, Kraków, Poland, ,
备注：17 pages, 8 figures
摘要：网络安全至关重要，攻击正在迅速增长，检测起来也越来越困难。传统的防火墙和入侵检测系统，尽管被广泛使用和推荐，但它无法检测新的攻击、零日攻击和与任何配置规则不匹配的流量模式。因此，机器学习（ML）在网络安全领域是一种高效、低成本的解决方案。在应用数据分析后，我们使用Netflow数据集提取特征。然后，应用选择过程将这些特征相互比较。我们的实验集中于高效的机器学习算法如何检测机器人流量、恶意软件流量和背景流量。我们设法从一个数据集中获得了0.903的精度值，该数据集中有6.5%的机器人流、1.57%的正常流、0.18%的命令与控制（C&C）流和91.7%的背景流，总流量为2753884。结果表明，假阴性率低，假阳性检出率低。
摘要：Cybersecurity is essential, and attacks are rapidly growing and getting more challenging to detect. The traditional Firewall and Intrusion Detection system, even though it is widely used and recommended but it fails to detect new attacks, zero-day attacks, and traffic patterns that do not match with any configured rules. Therefore, Machine Learning (ML) can be an efficient and cost-reduced solution in cybersecurity. We used Netflow datasets to extract features after applying data analysis. Then, a selection process has been applied to compare these features with one another. Our experiments focus on how efficient machine learning algorithms can detect Bot traffic, Malware traffic, and background traffic. We managed to get 0.903 precision value from a dataset that has 6.5% Bot flows, 1.57% Normal flows, 0.18% Command&Control (C&C) flows, and 91.7% background flows, from 2,753,884 total flows. The results show low false-negative with few false-positive detections.

【5】 A Multi-Task Cross-Task Learning Architecture for Ad-hoc Uncertainty Estimation in 3D Cardiac MRI Image Segmentation
标题：三维心脏MRI图像分割中自组织不确定性估计的多任务跨任务学习结构
链接：https://arxiv.org/abs/2109.07702

作者：S. M. Kamrul Hasan,Cristian A. Linte
机构： Chester F. Carlson Center for Imaging Science, Department of Biomedical Engineering, Rochester Institute of Technology, Rochester, NY
备注：4 pages, 3 figures
摘要：医学图像分割得益于深度学习体系结构。此外，半监督学习（SSL）最近已成为一种通过利用大量未标记数据来提高模型整体性能的增长趋势。此外，在同一个模型中学习多个任务进一步提高了模型的可推广性。为了从3D心脏MR图像中生成更平滑和准确的分割模板，我们提出了一种多任务交叉任务学习一致性方法，以增强像素级（分割）和几何级（距离图）任务之间的相关性。我们对训练集中不同数量的标记数据进行了广泛的实验，证明了我们的模型从钆增强磁共振（GE-MR）图像分割左心房腔的有效性。通过结合不确定性估计来检测CNN生成的分割掩码中的故障，我们的研究进一步展示了我们的模型标记给定模型中低质量分割的潜力。
摘要：Medical image segmentation has significantly benefitted thanks to deep learning architectures. Furthermore, semi-supervised learning (SSL) has recently been a growing trend for improving a model's overall performance by leveraging abundant unlabeled data. Moreover, learning multiple tasks within the same model further improves model generalizability. To generate smoother and accurate segmentation masks from 3D cardiac MR images, we present a Multi-task Cross-task learning consistency approach to enforce the correlation between the pixel-level (segmentation) and the geometric-level (distance map) tasks. Our extensive experimentation with varied quantities of labeled data in the training sets justifies the effectiveness of our model for the segmentation of the left atrial cavity from Gadolinium-enhanced magnetic resonance (GE-MR) images. With the incorporation of uncertainty estimates to detect failures in the segmentation masks generated by CNNs, our study further showcases the potential of our model to flag low-quality segmentation from a given model.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】 Personalized Federated Learning for Heterogeneous Clients with Clustered Knowledge Transfer
标题：基于集群知识转移的异构客户个性化联合学习
链接：https://arxiv.org/abs/2109.08119

作者：Yae Jee Cho,Jianyu Wang,Tarun Chiruvolu,Gauri Joshi
机构：ECE Department, Carnegie Mellon University, Pittsburgh, PA , School of Computer Science
摘要：个性化联合学习（FL）的目标是训练能够为高度数据和系统异构的单个客户机提供良好性能的模型。然而，大多数工作在个性化FL中，假设在所有客户端使用相同的模型架构，并通过发送/接收模型增加通信成本。这对于FL的实际场景可能不可行。在实践中，客户端具有高度异构的系统能力和有限的通信资源。在我们的工作中，我们提出了一个个性化的FL框架PerFed CKT，在该框架中，客户可以使用异构模型体系结构，而不直接传递其模型参数。PerFed CKT使用集群协同蒸馏，客户机使用Logit将其知识传递给具有类似数据分布的其他客户机。我们从理论上证明了PerFed-CKT的收敛性和泛化特性，并从实证上证明了PerFed-CKT与最先进的个性化FL方案相比，在通信成本低几个数量级的情况下实现了高测试精度。
摘要：Personalized federated learning (FL) aims to train model(s) that can perform well for individual clients that are highly data and system heterogeneous. Most work in personalized FL, however, assumes using the same model architecture at all clients and increases the communication cost by sending/receiving models. This may not be feasible for realistic scenarios of FL. In practice, clients have highly heterogeneous system-capabilities and limited communication resources. In our work, we propose a personalized FL framework, PerFed-CKT, where clients can use heterogeneous model architectures and do not directly communicate their model parameters. PerFed-CKT uses clustered co-distillation, where clients use logits to transfer their knowledge to other clients that have similar data-distributions. We theoretically show the convergence and generalization properties of PerFed-CKT and empirically show that PerFed-CKT achieves high test accuracy with several orders of magnitude lower communication cost compared to the state-of-the-art personalized FL schemes.

【2】 On the inductive biases of deep domain adaptation
标题：论深层领域适应的归纳偏向
链接：https://arxiv.org/abs/2109.07920

作者：Rodrigue Siry,Louis Hémadou,Loïc Simon,Frédéric Jurie
机构：UNICAEN - GREYC - CNRS, ENPC, UNICAEN - ENSICAEN - GREYC - CNRS, SAFRAN
备注：10 pages, 8 Figures
摘要：领域对齐是目前无监督领域适应任务最普遍的解决方案，通常被表示为目标领域中某些理论风险上界的最小值。然而，进一步的工作揭示了理论和实践之间的严重不足：我们巩固了这一分析，并确认对特征施加域不变性既不必要也不足以获得低目标风险。相反，我们认为，成功的深域自适应在很大程度上依赖于常见实践中发现的隐藏的归纳偏差，例如模型预训练或编码器架构的设计。我们在流行的基准和我们自己的合成转移上进行各种烧蚀实验，以说明它们在典型情况下的作用。为了总结我们的分析，我们建议元学习参数归纳偏差，以解决特定的传输，并显示其优于手工启发式的性能。
摘要：Domain alignment is currently the most prevalent solution to unsupervised domain-adaptation tasks and are often being presented as minimizers of some theoretical upper-bounds on risk in the target domain. However, further works revealed severe inadequacies between theory and practice: we consolidate this analysis and confirm that imposing domain invariance on features is neither necessary nor sufficient to obtain low target risk. We instead argue that successful deep domain adaptation rely largely on hidden inductive biases found in the common practice, such as model pre-training or design of encoder architecture. We perform various ablation experiments on popular benchmarks and our own synthetic transfers to illustrate their role in prototypical situations. To conclude our analysis, we propose to meta-learn parametric inductive biases to solve specific transfers and show their superior performance over handcrafted heuristics.

【3】 Adaptive Control of Quadratic Costs in Linear Stochastic Differential Equations
标题：线性随机微分方程二次费用的自适应控制
链接：https://arxiv.org/abs/2109.07630

作者：Mohamad Kazem Shirani Faradonbeh,Mohamad Sadegh Shirani Faradonbeh
摘要：研究了自适应控制中的一个典型问题；未知连续时间线性动力系统二次费用最小化策略的设计与分析。我们解决了重要的挑战，包括学习潜在随机微分方程未知参数的准确性，以及由于次优行为（即后悔）导致的性能下降的全面分析。然后，提出了一种易于实现的平衡勘探与开发的算法，并给出了时间界限平方根的理论保证。此外，我们给出了确保系统稳定性和指定后悔基本限度的严格结果。为了建立呈现的结果，开发了多个新颖的技术框架，这些框架可以是独立的。
摘要：We study a canonical problem in adaptive control; design and analysis of policies for minimizing quadratic costs in unknown continuous-time linear dynamical systems. We address important challenges including accuracy of learning the unknown parameters of the underlying stochastic differential equation, as well as full analyses of performance degradation due to sub-optimal actions (i.e., regret). Then, an easy-to-implement algorithm for balancing exploration versus exploitation is proposed, followed by theoretical guarantees showing a square-root of time regret bound. Further, we present tight results for assuring system stability and for specifying fundamental limits for regret. To establish the presented results, multiple novel technical frameworks are developed, which can be of independent interests.

【4】 Towards Zero-shot Cross-lingual Image Retrieval and Tagging
标题：面向Zero-Shot的跨语言图像检索与标注
链接：https://arxiv.org/abs/2109.07622

作者：Pranav Aggarwal,Ritiz Tambi,Ajinkya Kale
机构：Adobe Inc., San Jose, USA
备注：Presented at Workshop on Multilingual Search, in conjunction with 30th The Web Conference 2021. arXiv admin note: substantial text overlap with arXiv:2012.05107
摘要：最近，人们对多模态语言和视觉问题的兴趣激增。在语言方面，大多数模型主要关注英语，因为大多数多模态数据集都是单语的。我们试图通过在文本方面使用跨语言预训练的Zero-Shot方法来学习多模态表示法来弥补这一差距。我们提出了一种简单而实用的方法来建立跨语言图像检索模型，该模型在单语训练数据集上进行训练，但可以在推理过程中以Zero-Shot跨语言方式使用。我们还引入了一个新的目标函数，该函数通过将不同的文本相互推离来收紧文本嵌入簇。为了进行评估，我们引入了一个新的1K多语言MSCOCO2014字幕测试数据集（XTD10），该数据集采用7种语言，我们使用众包平台收集。我们使用它作为跨语言Zero-Shot模型性能的测试集。我们还演示了如何将跨语言模型用于下游任务，如以Zero-Shot方式进行多语言图像标记。XTD10数据集在以下位置公开：https://github.com/adobe-research/Cross-lingual-Test-Dataset-XTD10.
摘要：There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a zero-shot approach for learning multi-modal representations using cross-lingual pre-training on the text side. We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during inference. We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts away from each other. For evaluation, we introduce a new 1K multi-lingual MSCOCO2014 caption test dataset (XTD10) in 7 languages that we collected using a crowdsourcing platform. We use this as the test set for zero-shot model performance across languages. We also demonstrate how a cross-lingual model can be used for downstream tasks like multi-lingual image tagging in a zero shot manner. XTD10 dataset is made publicly available here: https://github.com/adobe-research/Cross-lingual-Test-Dataset-XTD10.

【5】 On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation
标题：论领域适配中数据选择与精调的互补性
链接：https://arxiv.org/abs/2109.07591

作者：Dan Iter,David Grangier
机构：Stanford University, Google Brain
摘要：神经网络的域自适应通常依赖于三个训练阶段：预训练、选择数据训练和微调。数据选择通过依靠目标域数据的小样本识别的预训练数据进一步训练，从而提高目标域的泛化能力。这项工作检验了数据选择对于语言建模和机器翻译的好处。我们的实验评估了选择与微调的互补性，并得出了切实可行的建议：（i）所选数据必须与微调域相似，但不能削弱微调的互补效应；（ii）在为快速但有限的进展选择少量数据或为缓慢但持久的进展选择大量数据之间存在权衡；（iii）数据选择可在训练前的早期应用，性能增益可与长时间的训练前阶段相媲美；（iv）来自领域分类器的数据选择通常比流行的对比数据选择方法更有效。
摘要：Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experiments assess the complementarity of selection with fine tuning and result in practical recommendations: (i) selected data must be similar to the fine-tuning domain but not so much as to erode the complementary effect of fine-tuning; (ii) there is a trade-off between selecting little data for fast but limited progress or much data for slow but long lasting progress; (iii) data selection can be applied early during pretraining, with performance gains comparable to long pretraining session; (iv) data selection from domain classifiers is often more effective than the popular contrastive data selection method.

【6】 How to Simplify Search: Classification-wise Pareto Evolution for One-shot Neural Architecture Search
标题：如何简化搜索：一次神经结构搜索的分类帕累托进化
链接：https://arxiv.org/abs/2109.07582

作者：Lianbo Ma,Nan Li,Guo Yu,Xiaoyu Geng,Min Huang,Xingwei Wang
机构： East China University of Science and Technology
摘要：在深度神经模型的部署中，如何在不同的设计目标下有效地、自动地找到可行的深度模型是至关重要的。大多数现有的神经体系结构搜索（NAS）方法在搜索过程中利用替代项来预测候选体系结构的详细性能（例如精度和模型大小），但这是复杂和低效的。相比之下，我们的目标是通过将复杂的多目标NAS任务转化为简单的Pareto优势分类任务，学习一个有效的Pareto分类器来简化NAS的搜索过程。为此，我们提出了一种用于一次性NAS的基于分类的Pareto进化方法，其中在线分类器被训练来预测候选结构和构建的参考结构之间的优势关系，而不是使用替代物来拟合目标函数。本研究的主要贡献是将supernet自适应变为Pareto分类器。此外，我们还设计了两种自适应方案，分别选择用于构建分类边界的参考体系结构集和调整正样本相对于负样本的比率。在广泛使用的基准数据集上，我们将所提出的进化方法与最新的方法进行了比较，实验结果表明，所提出的方法优于其他方法，并且在不同的目标和约束条件下发现了许多不同模型大小（从2M到6M）的神经结构。
摘要：In the deployment of deep neural models, how to effectively and automatically find feasible deep models under diverse design objectives is fundamental. Most existing neural architecture search (NAS) methods utilize surrogates to predict the detailed performance (e.g., accuracy and model size) of a candidate architecture during the search, which however is complicated and inefficient. In contrast, we aim to learn an efficient Pareto classifier to simplify the search process of NAS by transforming the complex multi-objective NAS task into a simple Pareto-dominance classification task. To this end, we propose a classification-wise Pareto evolution approach for one-shot NAS, where an online classifier is trained to predict the dominance relationship between the candidate and constructed reference architectures, instead of using surrogates to fit the objective functions. The main contribution of this study is to change supernet adaption into a Pareto classifier. Besides, we design two adaptive schemes to select the reference set of architectures for constructing classification boundary and regulate the rate of positive samples over negative ones, respectively. We compare the proposed evolution approach with state-of-the-art approaches on widely-used benchmark datasets, and experimental results indicate that the proposed approach outperforms other approaches and have found a number of neural architectures with different model sizes ranging from 2M to 6M under diverse objectives and constraints.

强化学习(4篇)

【1】 Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
标题：批量强化学习中三种正则化方法的比较与统一
链接：https://arxiv.org/abs/2109.08134

作者：Sarah Rathnam,Susan A. Murphy,Finale Doshi-Velez
机构： a challenge arises in understanding how they relate 1Department of Applied Mathematics, Harvard Univer-sity 2Department of Computer Science, Harvard University 3Department of Statistics
备注：ICML Workshop on Reinforcement Learning Theory 2021
摘要：在批量强化学习中，可能存在探索性差的状态-动作对，从而导致学习性差、模型不准确和关联策略性能差。各种正则化方法可以缓解在马尔可夫决策过程（MDP）中学习过于复杂模型的问题，但是它们在技术上和直观上都有不同的操作方式，并且缺乏一种比较它们的通用形式。本文将三种正则化方法统一在一个共同的框架——加权平均转移矩阵中。考虑这种常见形式的正则化方法说明了批处理数据集的MDP结构和状态-动作对分布如何影响正则化方法的相对性能。我们通过对一系列MDP和数据收集政策的实证评估，确认了从通用框架中产生的直觉。
摘要：In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

【2】 Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
标题：多任务离线强化学习的保守数据共享
链接：https://arxiv.org/abs/2109.08128

作者：Tianhe Yu,Aviral Kumar,Yevgen Chebotar,Karol Hausman,Sergey Levine,Chelsea Finn
机构：Stanford University,Google Research,UC Berkeley, (∗Equal Contribution)
摘要：离线强化学习（RL）算法在有大量预收集数据的领域显示了良好的效果。然而，以前的方法侧重于使用离线数据集从头开始解决单个问题，而不考虑离线RL代理如何获得多种技能。我们认为离线RL的一个自然用例是，我们可以将在各种场景中收集的大量数据集中起来解决不同的任务，并利用所有这些数据更有效地学习所有任务的行为，而不是孤立地训练每个任务。然而，在多任务脱机RL中跨所有任务共享数据在实践中的表现令人惊讶地糟糕。通过深入的实证分析，我们发现共享数据实际上会加剧学习策略和数据集之间的分布转移，这反过来会导致学习策略的分歧和较差的绩效。为了应对这一挑战，我们开发了一种在多任务离线RL中共享数据的简单技术，该技术基于对任务特定数据的改进来路由数据。我们称这种方法为保守数据共享（CDS），它可以应用于多个单任务离线RL方法。在一系列具有挑战性的多任务移动、导航和基于视觉的机器人操作问题上，与以前的离线多任务RL方法和以前的数据共享方法相比，CDS实现了最佳或可比的性能。
摘要：Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considering how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to learn behaviors for all the tasks more effectively rather than training each one in isolation. However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice. Thorough empirical analysis, we find that sharing data can actually exacerbate the distributional shift between the learned policy and the dataset, which in turn can lead to divergence of the learned policy and poor performance. To address this challenge, we develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data. We call this approach conservative data sharing (CDS), and it can be applied with multiple single-task offline RL methods. On a range of challenging multi-task locomotion, navigation, and vision-based robotic manipulation problems, CDS achieves the best or comparable performance compared to prior offline multi-task RL methods and previous data sharing approaches.

【3】 Estimation of Warfarin Dosage with Reinforcement Learning
标题：基于强化学习的华法林剂量估计
链接：https://arxiv.org/abs/2109.07564

作者：Arpita Vats
机构：Department of Computer Science, Boston University, Boston,USA, In this paper it has attempted to use Reinforcement learning to model the proper, dosage of Warfarin for patients.The paper first examines two baselines: a fixed
摘要：在本文中，它尝试使用强化学习为患者建立适当的华法林剂量模型。本文首先检查了两个基线：35 mg/周剂量的固定模型和依赖于患者数据的线性模型。我们实现了一个LinUCB bandit，它改进了基于后悔和错误百分比的性能。在LinUCB bandit的基础上，我们尝试了在线监督学习和奖励重塑来提高绩效。我们的结果明显超过了基线，显示了使用多武装匪徒和人工智能帮助医生决定适当剂量的前景。
摘要：In this paper, it has attempted to use Reinforcement learning to model the proper dosage of Warfarin for patients.The paper first examines two baselines: a fixed model of 35 mg/week dosages and a linear model that relies on patient data. We implemented a LinUCB bandit that improved performance measured on regret and percent incorrect. On top of the LinUCB bandit, we experimented with online supervised learning and reward reshaping to boost performance. Our results clearly beat the baselines and show the promise of using multi-armed bandits and artificial intelligence to aid physicians in deciding proper dosages.

【4】 Short Quantum Circuits in Reinforcement Learning Policies for the Vehicle Routing Problem
标题：车辆路径问题强化学习策略中的短路量子算法
链接：https://arxiv.org/abs/2109.07498

作者：Fabio Sanches,Sean Weinberg,Takanori Ide,Kazumitsu Kamiya
机构：QC Ware Corp., Palo Alto, CA USA, AISIN CORPORATION, Tokyo Research Center, Chiyoda-ku, Tokyo, Japan, Aisin Technical Center of America, San Jose, CA USA, )
备注：15 pages, 9 figures
摘要：量子计算和机器学习具有共生的潜力。然而，除了当前设备的硬件限制外，在量子电路能够有效地与当前的机器学习任务结合之前，还必须解决一些基本问题。我们报告了一种在强化学习中使用的注意模型背景下进行这种整合的新策略。实现注意机制的代理已成功地应用于组合路由问题的某些情况，首先对图上的节点进行编码，然后对节点进行顺序解码，直到选择了一条路由。我们证明了简单的量子电路可以用来代替经典的注意头层，同时保持性能。我们的方法修改了[1]中使用的网络，将每个节点的密钥和查询向量替换为测量前纠缠的量子态。由此产生的混合经典量子代理在车辆路径问题的背景下进行了测试，其性能与原始经典方法具有竞争力。我们将我们的模型视为可以放大的原型，并将其视为进一步研究量子计算在强化学习中的作用的途径。
摘要：Quantum computing and machine learning have potential for symbiosis. However, in addition to the hardware limitations from current devices, there are still basic issues that must be addressed before quantum circuits can usefully incorporate with current machine learning tasks. We report a new strategy for such an integration in the context of attention models used for reinforcement learning. Agents that implement attention mechanisms have successfully been applied to certain cases of combinatorial routing problems by first encoding nodes on a graph and then sequentially decoding nodes until a route is selected. We demonstrate that simple quantum circuits can used in place of classical attention head layers while maintaining performance. Our method modifies the networks used in [1] by replacing key and query vectors for every node with quantum states that are entangled before being measured. The resulting hybrid classical-quantum agent is tested in the context of vehicle routing problems where its performance is competitive with the original classical approach. We regard our model as a prototype that can be scaled up and as an avenue for further study on the role of quantum computing in reinforcement learning.

元学习(1篇)

【1】 Sign-MAML: Efficient Model-Agnostic Meta-Learning by SignSGD
标题：Sign-MAML：基于SignSGD的高效模型不可知元学习
链接：https://arxiv.org/abs/2109.07497

作者：Chen Fan,Parikshit Ram,Sijia Liu
机构：College of Information and Computer Sciences, University of Massachusetts Amherst, IBM Research, Computer Science and Engineering, Michigan State University, MIT-IBM Watson AI Lab
摘要：我们提出了一种新的计算效率高的一阶模型不可知元学习算法（MAML）。关键的使能技术是将MAML解释为双层优化（BLO）问题，并利用基于符号的SGD（signSGD）作为BLO的低级优化器。我们证明了MAML，通过面向signSGD的BLO，自然地产生了一个交替优化方案，只需要学习元模型的一阶梯度。我们将得到的MAML算法称为符号MAML。与传统的一阶MAML（FO-MAML）算法相比，符号MAML算法在理论上是有根据的，因为它没有对元训练过程中不存在二阶导数进行任何假设。在实践中，我们证明了符号MAML在各种Few-Shot图像分类任务中的性能优于FO-MAML，并且与MAML相比，它在分类精度和计算效率之间实现了更为优雅的折衷。
摘要：We propose a new computationally-efficient first-order algorithm for Model-Agnostic Meta-Learning (MAML). The key enabling technique is to interpret MAML as a bilevel optimization (BLO) problem and leverage the sign-based SGD(signSGD) as a lower-level optimizer of BLO. We show that MAML, through the lens of signSGD-oriented BLO, naturally yields an alternating optimization scheme that just requires first-order gradients of a learned meta-model. We term the resulting MAML algorithm Sign-MAML. Compared to the conventional first-order MAML (FO-MAML) algorithm, Sign-MAML is theoretically-grounded as it does not impose any assumption on the absence of second-order derivatives during meta training. In practice, we show that Sign-MAML outperforms FO-MAML in various few-shot image classification tasks, and compared to MAML, it achieves a much more graceful tradeoff between classification accuracy and computation efficiency.

医学相关(6篇)

【1】 The pitfalls of using open data to develop deep learning solutions for COVID-19 detection in chest X-rays
标题：使用开放数据开发深度学习解决方案用于胸片冠状病毒检测的陷阱
链接：https://arxiv.org/abs/2109.08020

作者：Rachael Harkness,Geoff Hall,Alejandro F Frangi,Nishant Ravikumar,Kieran Zucker
机构：CISTIB Centre for Computational Imaging and Simulation Technologies in Biomedicine, School of Computing, University of Leeds, Leeds, LS,JT, United Kingdom, Leeds Institute of Medical Research at St James's, United Kingdom
备注：To be published in MedInfo 21 - 18th World Congress on Medical and Health Informatics; 5 pages, 5 figures
摘要：自新冠病毒19出现以来，已经开发了深度学习模型，用于从胸部X光片中识别新冠病毒19。人工智能社区几乎没有直接访问医院数据的渠道，因此严重依赖于由众多数据源组成的公共数据。在对开源数据进行训练和测试时，模型的性能结果非常出色，超过了在新冠肺炎爆发之前人工智能在肺炎检测方面的报告能力。在这项研究中，有效模型在广泛使用的开源数据上进行训练，并在外部测试集和医院数据集上进行测试，用于将胸部X射线分为三类：新冠病毒-19、非新冠病毒肺炎和非肺炎。通过ROC曲线、混淆矩阵和标准分类度量对所研究模型的分类性能进行评估。可解释性模块用于探索对分类最重要的图像特征。数据分析和模型评估表明，流行的开源数据集COVIDx并不能代表真正的临床问题，对这一数据集的测试结果被夸大了。依赖开源数据会使模型容易受到偏差和混杂变量的影响，需要仔细分析，以开发临床有用/可行的人工智能工具，用于胸部X光检测新冠病毒-19。
摘要：Since the emergence of COVID-19, deep learning models have been developed to identify COVID-19 from chest X-rays. With little to no direct access to hospital data, the AI community relies heavily on public data comprising numerous data sources. Model performance results have been exceptional when training and testing on open-source data, surpassing the reported capabilities of AI in pneumonia-detection prior to the COVID-19 outbreak. In this study impactful models are trained on a widely used open-source data and tested on an external test set and a hospital dataset, for the task of classifying chest X-rays into one of three classes: COVID-19, non-COVID pneumonia and no-pneumonia. Classification performance of the models investigated is evaluated through ROC curves, confusion matrices and standard classification metrics. Explainability modules are implemented to explore the image features most important to classification. Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem and that results from testing on this are inflated. Dependence on open-source data can leave models vulnerable to bias and confounding variables, requiring careful analysis to develop clinically useful/viable AI tools for COVID-19 detection in chest X-rays.

【2】 Telehealthcare and Covid-19: A Noninvasive & Low Cost Invasive, Scalable and Multimodal Real-Time Smartphone Application for Early Diagnosis of SARS-CoV-2 Infection
标题：远程医疗和冠状病毒：一种用于SARS-CoV-2感染早期诊断的非侵入性、低成本、可扩展和多模式实时智能手机应用
链接：https://arxiv.org/abs/2109.07846

作者：Abdullah Bin Shams,Md. Mohsin Sarker Raihan,Md. Mohi Uddin Khan,Rahat Bin Preo,Ocean Monjur
机构：The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON M,S ,G, Canada., Department of Biomedical Engineering, Khulna University of Engineering &, Technology, Khulna , Bangladesh.
备注：14 Pages. This article has been submitted for review to a prestigious journal
摘要：全球冠状病毒大流行淹没了许多卫生保健系统，强制实行封锁，并鼓励在家工作，以控制病毒传播，防止住院患者人数过多。这促使远程医疗的广泛应用，为患者提供低风险的护理。然而，新变种的持续变异和检测试剂盒的普遍缺乏，特别是在发展中国家，对控制未来潜在的感染浪潮具有挑战性。在本文中，我们提出了一种新的基于智能手机应用程序的平台，用于早期诊断可能的新冠病毒-19感染患者。该应用程序根据可能的症状、咳嗽声和特定的血液生物标记物提供三种诊断模式。当用户选择特定设置并提供必要信息时，它会将数据发送到使用internet部署在远程服务器中的经过训练的机器学习（ML）模型。然后，ML算法预测感染新冠病毒-19的可能性，并将反馈发送给用户。整个过程实时进行。我们的机器学习模型可以从血液参数、咳嗽声和症状中识别新冠病毒-19患者，准确率分别为100%、95.65%和77.59%。此外，血液和声音的ML敏感性为100%，这表明正确识别了新冠病毒阳性患者。这对限制病毒传播具有重要意义。多模式提供了多种诊断方法，以更好地对可能的感染者进行分类，再加上我们技术的即时性，证明了远程医疗作为一种简单、广泛、低成本、可扩展的未来流行病诊断解决方案的威力。
摘要：The global coronavirus pandemic overwhelmed many health care systems, enforcing lockdown and encouraged work from home to control the spread of the virus and prevent overrunning of hospitalized patients. This prompted a sharp widespread use of telehealth to provide low-risk care for patients. Nevertheless, a continuous mutation into new variants and widespread unavailability of test kits, especially in developing countries, possess the challenge to control future potential waves of infection. In this paper, we propose a novel Smartphone application-based platform for early diagnosis of possible Covid-19 infected patients. The application provides three modes of diagnosis from possible symptoms, cough sound, and specific blood biomarkers. When a user chooses a particular setting and provides the necessary information, it sends the data to a trained machine learning (ML) model deployed in a remote server using the internet. The ML algorithm then predicts the possibility of contracting Covid-19 and sends the feedback to the user. The entire procedure takes place in real-time. Our machine learning models can identify Covid-19 patients with an accuracy of 100%, 95.65%, and 77.59% from blood parameters, cough sound, and symptoms respectively. Moreover, the ML sensitivity for blood and sound is 100%, which indicates correct identification of Covid positive patients. This is significant in limiting the spread of the virus. The multimodality offers multiplex diagnostic methods to better classify possible infectees and together with the instantaneous nature of our technique, demonstrates the power of telehealthcare as an easy and widespread low-cost scalable diagnostic solution for future pandemics.

【3】 The Neural Metric Factorization for Computational Drug Repositioning
标题：神经度量因式分解在计算药物定位中的应用
链接：https://arxiv.org/abs/2109.07690

作者：Xinxing Yang,Genke Yang
机构：Ningbo Artificial Intelligence Institute, Shanghai Jiao Tong University, Department of Automation, Shanghai Jiao Tong University
备注：14 pages
摘要：计算药物重新定位旨在为上市药物发现新的治疗疾病，与传统药物开发相比具有成本低、开发周期短、可控性高等优点。矩阵分解模型由于其易于实现和良好的可扩展性，已成为计算药物重新定位的主流基础技术。然而，矩阵分解模型使用内积运算来表示药物与疾病之间的关联，缺乏表达能力。此外，药物或疾病的相似程度不能隐含在各自的潜在因子载体上，这不符合药物发现的常识。因此，本文提出了一种计算药物重新定位的神经度量因子分解模型。我们新奇地考虑药物和疾病的潜在因子向量作为高维坐标系统中的一个点，并提出广义欧几里德距离来表示药物和疾病之间的关联，以弥补内积运算的缺点。此外，通过将多个药物和疾病度量信息嵌入到潜在因子向量的编码空间中，使得相似药物或疾病的潜在因子向量更接近。最后，我们在两个真实数据集上进行了广泛的分析实验，以证明上述改进点的有效性和NMF模型的优越性。
摘要：Computational drug repositioning aims to discover new therapeutic diseases for marketed drugs and has the advantages of low cost, short development cycle, and high controllability compared to traditional drug development. The matrix factorization model has become a mainstream cornerstone technique for computational drug repositioning due to its ease of implementation and excellent scalability. However, the matrix factorization model uses the inner product operation to represent the association between drugs and diseases, which is lacking in expressive ability. Moreover, the degree of similarity of drugs or diseases could not be implied on their respective latent factor vectors, which is not satisfy the common sense of drug discovery. Therefore, a neural metric factorization model for computational drug repositioning is proposed in this work. We novelly consider the latent factor vector of drugs and diseases as a point in a high-dimensional coordinate system and propose a generalized Euclidean distance to represent the association between drugs and diseases to compensate for the shortcomings of the inner product operation. Furthermore, by embedding multiple drug and disease metrics information into the encoding space of the latent factor vector, the latent factor vectors of similar drugs or diseases are made closer. Finally, we conduct wide analysis experiments on two real datasets to demonstrate the effectiveness of the above improvement points and the superiority of the NMF model.

【4】 Interpretable Additive Recurrent Neural Networks For Multivariate Clinical Time Series
标题：多变量临床时间序列的可解释加性递归神经网络
链接：https://arxiv.org/abs/2109.07602

作者：Asif Rahman,Yale Chang,Jonathan Rubin
机构：Philips Research North America, Cambridge, MA, USA
摘要：带有递归神经网络（RNN）的时间序列模型具有较高的精度，但由于特征交互、时间交互和非线性变换，很难解释。可解释性在医疗保健等领域非常重要，在这些领域中，需要构建能够深入了解他们所了解的关系的模型，以验证和信任模型预测。我们需要精确的时间序列模型，用户可以理解各个输入特征的贡献。我们提出了可解释RNN（I-RNN），它通过强制模型中变量之间的关系是可加性的来平衡模型的复杂性和准确性。RNN的隐藏状态之间的相互作用受到限制，并在最后一步进行附加组合。I-RNN专门捕捉临床时间序列的独特特征，这些特征在时间上采样不均匀，异步获取，并且数据缺失。重要的是，隐藏状态激活表示与预测目标相关的特征系数，并且可以可视化为捕捉单个输入特征和结果之间全局关系的风险曲线。我们在Physionet 2012挑战数据集上评估I-RNN模型，以预测住院死亡率，并在现实世界的临床决策支持任务：预测重症监护病房的血流动力学干预。I-RNN以全局和局部特征重要性的形式提供解释，其可与高度可理解的模型相媲美，如在手动工程特征上训练的决策树，同时显著优于它们。I-RNN保持可理解性，同时提供与最先进的基于衰减和基于插值的周期性时间序列模型相当的精度。真实临床数据集的实验结果驳斥了准确性和可解释性之间存在权衡的神话。
摘要：Time series models with recurrent neural networks (RNNs) can have high accuracy but are unfortunately difficult to interpret as a result of feature-interactions, temporal-interactions, and non-linear transformations. Interpretability is important in domains like healthcare where constructing models that provide insight into the relationships they have learned are required to validate and trust model predictions. We want accurate time series models where users can understand the contribution of individual input features. We present the Interpretable-RNN (I-RNN) that balances model complexity and accuracy by forcing the relationship between variables in the model to be additive. Interactions are restricted between hidden states of the RNN and additively combined at the final step. I-RNN specifically captures the unique characteristics of clinical time series, which are unevenly sampled in time, asynchronously acquired, and have missing data. Importantly, the hidden state activations represent feature coefficients that correlate with the prediction target and can be visualized as risk curves that capture the global relationship between individual input features and the outcome. We evaluate the I-RNN model on the Physionet 2012 Challenge dataset to predict in-hospital mortality, and on a real-world clinical decision support task: predicting hemodynamic interventions in the intensive care unit. I-RNN provides explanations in the form of global and local feature importances comparable to highly intelligible models like decision trees trained on hand-engineered features while significantly outperforming them. I-RNN remains intelligible while providing accuracy comparable to state-of-the-art decay-based and interpolation-based recurrent time series models. The experimental results on real-world clinical datasets refute the myth that there is a tradeoff between accuracy and interpretability.

【5】 Federated Contrastive Learning for Decentralized Unlabeled Medical Images
标题：去中心化无标记医学图像的联合对比学习
链接：https://arxiv.org/abs/2109.07504

作者：Nanqing Dong,Irina Voiculescu
机构：Department of Computer Science, University of Oxford, Oxford, UK
备注：Accepted by MICCAI 2021
摘要：计算机视觉中的标签有效范例是基于对未标记数据的自我监督对比预训练，然后使用少量标签进行微调。在临床领域实际使用联邦计算环境并学习医学图像带来了特定的挑战。在这项工作中，我们提出了FedMoCo，一个强大的联邦对比学习（FCL）框架，它有效地利用了分散的未标记医疗数据。FedMoCo有两个新模块：元数据传输模块（节点间统计数据扩充模块）和自适应聚合模块（基于代表性相似性分析的聚合模块）。据我们所知，这是FCL首次对医学图像进行研究。我们的实验表明，FedMoCo在为下游任务提取有意义的表示方面始终优于FedAvg（一种开创性的联邦学习框架）。我们进一步表明，FedMoCo可以显著减少下游任务（如新冠病毒-19检测）所需的标记数据量，以实现合理的性能。
摘要：A label-efficient paradigm in computer vision is based on self-supervised contrastive pre-training on unlabeled data followed by fine-tuning with a small number of labels. Making practical use of a federated computing environment in the clinical domain and learning on medical images poses specific challenges. In this work, we propose FedMoCo, a robust federated contrastive learning (FCL) framework, which makes efficient use of decentralized unlabeled medical data. FedMoCo has two novel modules: metadata transfer, an inter-node statistical data augmentation module, and self-adaptive aggregation, an aggregation module based on representational similarity analysis. To the best of our knowledge, this is the first FCL work on medical images. Our experiments show that FedMoCo can consistently outperform FedAvg, a seminal federated learning framework, in extracting meaningful representations for downstream tasks. We further show that FedMoCo can substantially reduce the amount of labeled data required in a downstream task, such as COVID-19 detection, to achieve a reasonable performance.

【6】 Quality-aware Cine Cardiac MRI Reconstruction and Analysis from Undersampled k-space Data
标题：基于欠采样k空间数据的质量感知电影心脏MRI重建与分析
链接：https://arxiv.org/abs/2109.07955

作者：Ines Machado,Esther Puyol-Anton,Kerstin Hammernik,Gastao Cruz,Devran Ugurlu,Bram Ruijsink,Miguel Castelo-Branco,Alistair Young,Claudia Prieto,Julia A. Schnabel,Andrew P. King
机构： School of Biomedical Engineering & Imaging Sciences, King’s College London, UK, Technical University of Munich, Germany, Biomedical Image Analysis Group, Imperial College London, UK, Department of Adult and Paediatric Cardiology, Guy’s and St Thomas’ NHS
摘要：电影心脏MRI通常用于心脏健康评估，但成像过程缓慢，通常需要多次屏气以获得足够的k空间轮廓，以确保良好的图像质量。在过去的几十年中，已经提出了几种基于欠采样的重建技术来加速电影心脏MRI的采集。然而，在采集之前，欠采样因子通常固定为保守值，以确保诊断图像质量，可能导致不必要的长扫描时间。在本文中，我们提出了一种端到端的质量感知电影短轴心脏MRI框架，该框架将图像采集和重建与后续任务（如分割、容积曲线分析和心脏功能参数估计）相结合。其目标是通过仅获取一小部分k空间数据来减少扫描时间，从而能够重建能够通过质量控制检查并产生可靠的心脏功能参数估计的图像。该框架包括一个用于从欠采样数据重建2D+t心脏电影MRI图像的深度学习模型，一个用于检测高质量重建的图像质量控制步骤，然后是一个用于双心室分割的深度学习模型，质量控制步骤，用于检测高质量分段和心脏功能参数的自动计算。为了证明所提出方法的可行性，我们使用从英国生物银行（n=270）挑选的参与者队列、200名健康受试者和70名心肌病患者进行模拟。我们的结果表明，我们可以在扫描时间从每层12秒减少到4秒的情况下生成质量可控的图像，从而能够可靠地估计心脏功能参数，如射血分数，平均绝对误差在5%以内。
摘要：Cine cardiac MRI is routinely acquired for the assessment of cardiac health, but the imaging process is slow and typically requires several breath-holds to acquire sufficient k-space profiles to ensure good image quality. Several undersampling-based reconstruction techniques have been proposed during the last decades to speed up cine cardiac MRI acquisition. However, the undersampling factor is commonly fixed to conservative values before acquisition to ensure diagnostic image quality, potentially leading to unnecessarily long scan times. In this paper, we propose an end-to-end quality-aware cine short-axis cardiac MRI framework that combines image acquisition and reconstruction with downstream tasks such as segmentation, volume curve analysis and estimation of cardiac functional parameters. The goal is to reduce scan time by acquiring only a fraction of k-space data to enable the reconstruction of images that can pass quality control checks and produce reliable estimates of cardiac functional parameters. The framework consists of a deep learning model for the reconstruction of 2D+t cardiac cine MRI images from undersampled data, an image quality-control step to detect good quality reconstructions, followed by a deep learning model for bi-ventricular segmentation, a quality-control step to detect good quality segmentations and automated calculation of cardiac functional parameters. To demonstrate the feasibility of the proposed approach, we perform simulations using a cohort of selected participants from the UK Biobank (n=270), 200 healthy subjects and 70 patients with cardiomyopathies. Our results show that we can produce quality-controlled images in a scan time reduced from 12 to 4 seconds per slice, enabling reliable estimates of cardiac functional parameters such as ejection fraction within 5% mean absolute error.

蒸馏|知识提取(1篇)

【1】 DisUnknown: Distilling Unknown Factors for Disentanglement Learning
标题：DISUNKNOWN：解缠式学习中未知因素的提炼
链接：https://arxiv.org/abs/2109.08090

作者：Sitao Xiang,Yuming Gu,Pengda Xiang,Menglei Chai,Hao Li,Yajie Zhao,Mingming He
机构：University of Southern California,USC Institute for Creative Technologies,Snap Inc.
备注：Accepted for publication at ICCV 2021. Videos, demos and updates will be published at project website: this https URL
摘要：将数据分解为可解释和独立的因素对于可控发电任务至关重要。有了标记数据的可用性，监督有助于按预期对特定因素进行分离。然而，为实现完全监督的解纠缠，对每一个因素进行标记通常是昂贵的，甚至是不可能的。在本文中，我们采用一种通用设置，其中所有难以标记或识别的因素都封装为单个未知因素。在此背景下，我们提出了一个灵活的弱监督多因子解纠缠框架disonknown，该框架提取未知因子，以实现标记因子和未知因子的多条件生成。具体而言，采用两阶段训练方法，首先使用有效且稳健的训练方法对未知因素进行解纠缠，然后使用未知蒸馏对所有标记因素进行适当解纠缠，从而训练最终发生器。为了证明我们的方法的泛化能力和可扩展性，我们对多个基准数据集进行了定性和定量评估，并将其应用于复杂数据集上的各种实际应用。
摘要：Disentangling data into interpretable and independent factors is critical for controllable generation tasks. With the availability of labeled data, supervision can help enforce the separation of specific factors as expected. However, it is often expensive or even impossible to label every single factor to achieve fully-supervised disentanglement. In this paper, we adopt a general setting where all factors that are hard to label or identify are encapsulated as a single unknown factor. Under this setting, we propose a flexible weakly-supervised multi-factor disentanglement framework DisUnknown, which Distills Unknown factors for enabling multi-conditional generation regarding both labeled and unknown factors. Specifically, a two-stage training approach is adopted to first disentangle the unknown factor with an effective and robust training method, and then train the final generator with the proper disentanglement of all labeled factors utilizing the unknown distillation. To demonstrate the generalization capacity and scalability of our method, we evaluate it on multiple benchmark datasets qualitatively and quantitatively and further apply it to various real-world applications on complicated datasets.

超分辨率|去噪|去模糊|去雾(1篇)

【1】 Super-resolution data assimilation
标题：超分辨率资料同化
链接：https://arxiv.org/abs/2109.08017

作者：Sébastien Barthélémy,Julien Brajard,Laurent Bertino,François Counillon
机构：Received: date Accepted: date
摘要：提高模式的分辨率可以改善数据同化系统的性能：首先，因为模式场与高分辨率观测值更一致，然后校正更持久，并且，通过集合数据同化，预测误差协方差得到改善。然而，分辨率的增加和计算成本的立方增加有关。在这里，我们正在测试一种受图像超分辨率技术启发的方法，称为“超分辨率数据同化”（SRDA）。从低分辨率预测开始，神经网络（NN）模拟高分辨率场，然后用于同化高分辨率观测。我们将SRDA应用于代表简化的地表海洋动力学的准地转模式，模式分辨率比参考高分辨率低四倍，并使用集合卡尔曼滤波数据同化方法。我们表明，SRDA优于低分辨率数据同化方法和采用三次样条插值代替神经网络的SRDA版本。神经网络预测低分辨率和高分辨率模型动力学之间的系统差异的能力解释了增强的性能，例如通过校正涡流传播速度的差异。SRDA比LR数据同化系统（使用25个成员的集合）的计算成本增加了55%，误差减少了40%，使性能非常接近HR系统（比LR EnKF大16%，而LR EnKF大92%）。SRDA不会降低集成系统的可靠性。
摘要：Increasing the resolution of a model can improve the performance of a data assimilation system: first because model field are in better agreement with high resolution observations, then the corrections are better sustained and, with ensemble data assimilation, the forecast error covariances are improved. However, resolution increase is associated with a cubical increase of the computational costs. Here we are testing an approach inspired from images super-resolution techniques and called "Super-resolution data assimilation" (SRDA). Starting from a low-resolution forecast, a neural network (NN) emulates a high-resolution field that is then used to assimilate high-resolution observations. We apply the SRDA to a quasi-geostrophic model representing simplified surface ocean dynamics, with a model resolution up to four times lower than the reference high-resolution and we use the Ensemble Kalman Filter data assimilation method. We show that SRDA outperforms the low-resolution data assimilation approach and a SRDA version with cubic spline interpolation instead of NN. The NN's ability to anticipate the systematic differences between low and high resolution model dynamics explains the enhanced performance, for example by correcting the difference of propagation speed of eddies. Increasing the computational cost by 55\% above the LR data assimilation system (using a 25-members ensemble), the SRDA reduces the errors by 40\% making the performance very close to the HR system (16\% larger, compared to 92\% larger for the LR EnKF). The reliability of the ensemble system is not degraded by SRDA.

自动驾驶|车辆|车道检测等(1篇)

【1】 Secure Your Ride: Real-time Matching Success Rate Prediction for Passenger-Driver Pairs
标题：保障您的乘车安全：乘客-司机对的实时匹配成功率预测
链接：https://arxiv.org/abs/2109.07571

作者：Yuandong Wang,Hongzhi Yin,Lian Wu,Tong Chen,Chunyang Liu
机构： Chen are with the School of Information Technologyand Electrical Engineering, The University of Queensland
备注：This article is accepted as a regular paper in an upcoming issue of the Transactions on Knowledge and Data Engineering
摘要：近年来，在线租车平台已成为城市交通不可或缺的一部分。在乘客通过站台与驾驶员匹配后，乘客和驾驶员都可以通过单击简单地接受或取消乘坐。因此，准确预测一对乘客-司机是否是一对很好的搭档，对于叫车平台设计即时订单分配至关重要。然而，由于出租车招呼平台的用户由双方组成，因此决策需要同时考虑驾驶员和乘客双方的动态。这使得它比传统的在线广告任务更具挑战性。此外，不同城市的可用数据量严重不平衡，这给为数据稀少的小城市训练准确的模型带来了困难。尽管复杂的神经网络体系结构有助于在数据匮乏的情况下提高预测精度，但过于复杂的设计将阻碍模型在生产环境中提供及时预测的能力。在本文中，为了准确预测乘客-驾驶员的MSR，我们提出了多视图模型（MV），该模型综合学习乘客、驾驶员、出行顺序以及上下文的动态特征之间的相互作用。针对数据不平衡问题，我们进一步设计了知识提取框架（KD），以利用来自数据密集城市的知识补充模型对较小城市的预测能力，并生成一个简单的模型以支持高效部署。最后，我们在几个不同城市的真实数据集上进行了大量实验，证明了我们的解决方案的优越性。
摘要：In recent years, online ride-hailing platforms have become an indispensable part of urban transportation. After a passenger is matched up with a driver by the platform, both the passenger and the driver have the freedom to simply accept or cancel a ride with one click. Hence, accurately predicting whether a passenger-driver pair is a good match turns out to be crucial for ride-hailing platforms to devise instant order assignments. However, since the users of ride-hailing platforms consist of two parties, decision-making needs to simultaneously account for the dynamics from both the driver and the passenger sides. This makes it more challenging than traditional online advertising tasks. Moreover, the amount of available data is severely imbalanced across different cities, creating difficulties for training an accurate model for smaller cities with scarce data. Though a sophisticated neural network architecture can help improve the prediction accuracy under data scarcity, the overly complex design will impede the model's capacity of delivering timely predictions in a production environment. In the paper, to accurately predict the MSR of passenger-driver, we propose the Multi-View model (MV) which comprehensively learns the interactions among the dynamic features of the passenger, driver, trip order, as well as context. Regarding the data imbalance problem, we further design the Knowledge Distillation framework (KD) to supplement the model's predictive power for smaller cities using the knowledge from cities with denser data and also generate a simple model to support efficient deployment. Finally, we conduct extensive experiments on real-world datasets from several different cities, which demonstrates the superiority of our solution.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views
标题：通过对对象和视图中的LiDAR离群值进行折扣，将2D对象位置提升到3D位置
链接：https://arxiv.org/abs/2109.07945

作者：Robert McCraith,Eldar Insafudinov,Lukas Neumann,Andrea Vedaldi
机构：UniversityofOxford{robert
备注：ICRA 2022 submission
摘要：我们提出了一个系统，用于将2D遮罩对象预测和原始激光雷达点云自动转换为对象的完整3D边界框。因为激光雷达点云是局部的，所以直接将边界框拟合到点云是没有意义的。相反，我们建议获得好的结果需要在数据集中的\emph{all}对象之间通过多个帧共同共享信息。然后，我们对基线进行了三项改进。首先，我们解决了在这个空间中通过直接优化预测对象旋转的模糊性，同时仍然通过模型反向传播旋转预测。其次，我们明确地对异常值进行建模，并通过学习其典型模式来分配网络任务，从而更好地对其进行贴现。第三，当视频数据可用时，我们强制执行时间一致性。有了这些贡献，我们的方法明显优于以前的工作，尽管这些方法使用了更复杂的管道、3D模型和额外的人类注释的外部先验信息源。
摘要：We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Because the LiDAR point clouds are partial, directly fitting bounding boxes to the point clouds is meaningless. Instead, we suggest that obtaining good results requires sharing information between \emph{all} objects in the dataset jointly, over multiple frames. We then make three improvements to the baseline. First, we address ambiguities in predicting the object rotations via direct optimization in this space while still backpropagating rotation prediction through the model. Second, we explicitly model outliers and task the network with learning their typical patterns, thus better discounting them. Third, we enforce temporal consistency when video data is available. With these contributions, our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.

联邦学习|隐私保护|加密(2篇)

【1】 OpenFed: An Open-Source Security and Privacy Guaranteed Federated Learning Framework
标题：OpenFED：一种开源的安全和隐私保障的联邦学习框架
链接：https://arxiv.org/abs/2109.07852

作者：Chen Dengsheng
机构：National University of Defense Technology, China
备注：18 pages, 3 figures, 1 table
摘要：人工智能技术的广泛应用，从自动驾驶车辆到先进的医疗诊断，带来了许多好处。联合学习是一种新型的人工智能，它提供了一些技术来帮助弥合个人数据保护与研究和商业部署利用之间的差距，特别是在安全和隐私是关键问题的用例中。在这里，我们介绍OpenFed，这是一个开源软件框架，可以同时满足数据保护和利用的需求。在实践中，尽管本地数据可用性有限，OpenFed仍能在低信任环境中实现最先进的模型开发，这为可持续的协作模型开发和商业部署奠定了基础，缓解了对资产保护的担忧。此外，OpenFed还提供了一个端到端工具包，以促进联邦学习算法的开发，并提供了几个基准，以在不同的计算范式和配置下进行公平的性能比较。
摘要：The broad application of artificial intelligence techniques ranging from self-driving vehicles to advanced medical diagnostics afford many benefits. Federated learning is a new breed of artificial intelligence, offering techniques to help bridge the gap between personal data protection and utilization for research and commercial deployment, especially in the use-cases where security and privacy are the key concerns. Here, we present OpenFed, an open-source software framework to simultaneously address the demands for data protection and utilization. In practice, OpenFed enables state-of-the-art model development in low-trust environments despite limited local data availability, which lays the groundwork for sustainable collaborative model development and commercial deployment by alleviating concerns of asset protection. In addition, OpenFed also provides an end-to-end toolkit to facilitate federated learning algorithm development as well as several benchmarks to fair performance comparison under diverse computing paradigms and configurations.

【2】 Subspace Learning for Personalized Federated Optimization
标题：个人化联邦优化的子空间学习
链接：https://arxiv.org/abs/2109.07628

作者：Seok-Ju Hahn,Minwoo Jeong,Junghye Lee
机构： Department of Industrial Engineering, Ulsan National Institute of Science and Technology, AI Lab, Kakao Enterprise, Artificial Intelligence Graduate School, Ulsan National Institute of Science and Technology
摘要：由于数据几乎在任何地方生成和存储，从数据分散的环境中学习模型是许多人工智能驱动的服务提供商感兴趣的任务。尽管在这种情况下，联邦学习已被确定为主要解决方案，但在个性化方面仍有改进的余地。训练联邦学习系统通常侧重于优化全局模型，该模型以相同的方式部署到所有客户端设备。但是，单一的全局模型不足以使每个客户机在其性能上个性化，因为本地数据假定在客户机之间分布不完全相同。我们提出了一种通过集成学习的视角来解决这种情况的方法，该方法基于低损耗子空间连续体的构建，该连续体生成两个端点（即全局模型和局部模型）的高精度集成。通过在多个标准基准数据集上的大量实验，我们证明了我们的方法在个性化和不可见的客户评估设置中都取得了一致的收益。
摘要：As data is generated and stored almost everywhere, learning a model from a data-decentralized setting is a task of interest for many AI-driven service providers. Although federated learning is settled down as the main solution in such situations, there still exists room for improvement in terms of personalization. Training federated learning systems usually focuses on optimizing a global model that is identically deployed to all client devices. However, a single global model is not sufficient for each client to be personalized on their performance as local data assumes to be not identically distributed across clients. We propose a method to address this situation through the lens of ensemble learning based on the construction of a low-loss subspace continuum that generates a high-accuracy ensemble of two endpoints (i.e. global model and local model). We demonstrate that our method achieves consistent gains both in personalized and unseen client evaluation settings through extensive experiments on several standard benchmark datasets.

推理|分析|理解|解释(4篇)

【1】 A literature survey on student feedback assessment tools and their usage in sentiment analysis
标题：学生反馈评价工具及其在情感分析中应用的文献调查
链接：https://arxiv.org/abs/2109.07904

作者：Himali Aryal
机构：Department of Information Technology and Electrical Engineering, Norwegian University of Science and Technology, Gjøvik, Norway
摘要：在线学习正变得越来越流行，无论是为了方便、适应工作时间，还是仅仅为了有从任何地方学习的自由。特别是在新冠病毒-19大流行期间，它已成为唯一可行的学习选择。教授各种理论内容混合的核心编程课程的有效性取决于学生的互动和反应。与通过Zoom或团队进行的数字讲座不同，讲师可能会在物理课程中从学生的面部表情、行为和态度中快速获得此类反应，即使听者基本上是空闲和非互动的。然而，虚拟学习中的学生评价是一项具有挑战性的任务。尽管存在挑战，但不同的技术正在逐步融入教学环境，以提高学生的参与度和动机。在本文中，我们评估了各种课堂反馈评估方法的有效性，如Kahoot！，Mentimeter、Padlet和polling，帮助讲师在整个课程中获得学生的实时反馈，并相应地调整教学风格。此外，学生建议涵盖的一些主题包括导师建议、改进教学风格、课程内容和其他主题。任何输入都会让讲师对如何改善学生的学习体验有宝贵的见解，然而，手动浏览所有定性评论并提取想法是很乏味的。因此，在本文中，我们提出了一个情绪分析模型，用于从学生的定性反馈意见中提取明确的建议。
摘要：Online learning is becoming increasingly popular, whether for convenience, to accommodate work hours, or simply to have the freedom to study from anywhere. Especially, during the Covid-19 pandemic, it has become the only viable option for learning. The effectiveness of teaching various hard-core programming courses with a mix of theoretical content is determined by the student interaction and responses. In contrast to a digital lecture through Zoom or Teams, a lecturer may rapidly acquire such responses from students' facial expressions, behavior, and attitude in a physical session, even if the listener is largely idle and non-interactive. However, student assessment in virtual learning is a challenging task. Despite the challenges, different technologies are progressively being integrated into teaching environments to boost student engagement and motivation. In this paper, we evaluate the effectiveness of various in-class feedback assessment methods such as Kahoot!, Mentimeter, Padlet, and polling to assist a lecturer in obtaining real-time feedback from students throughout a session and adapting the teaching style accordingly. Furthermore, some of the topics covered by student suggestions include tutor suggestions, enhancing teaching style, course content, and other subjects. Any input gives the instructor valuable insight into how to improve the student's learning experience, however, manually going through all of the qualitative comments and extracting the ideas is tedious. Thus, in this paper, we propose a sentiment analysis model for extracting the explicit suggestions from the students' qualitative feedback comments.

【2】 Detection Accuracy for Evaluating Compositional Explanations of Units
标题：单元成分解释评价的检测精度
链接：https://arxiv.org/abs/2109.07804

作者：Sayo M. Makinwa,Biagio La Rosa,Roberto Capobianco
机构：Sapienza University of Rome, Sony AI
备注：10 pages, 7 figures
摘要：最近，深度学习模型在解决复杂问题和不同领域中取得了成功，这增加了人们对理解所学内容的兴趣。因此，人们采用了不同的方法来解释这些模型，其中一种方法使用人类可以理解的概念作为解释。使用这种方法的两个例子是网络解剖和成分解释。前者使用原子概念解释单元，而后者使解释更具表现力，用逻辑形式取代原子概念。虽然从直觉上看，逻辑形式比原子概念更具信息性，但如何量化这一改进尚不清楚，它们的评估通常基于搜索过程中优化的同一度量以及待调整的超参数的使用。在本文中，我们建议使用检测准确度作为评估指标，它衡量各单位对其指定解释的检测一致性。我们表明，该度量（1）有效地评估了不同长度的解释，（2）可以用作合成解释搜索的停止标准，消除了解释长度超参数，（3）公开了新的专门单位，其长度1解释是其较长解释的知觉抽象。
摘要：The recent success of deep learning models in solving complex problems and in different domains has increased interest in understanding what they learn. Therefore, different approaches have been employed to explain these models, one of which uses human-understandable concepts as explanations. Two examples of methods that use this approach are Network Dissection and Compositional explanations. The former explains units using atomic concepts, while the latter makes explanations more expressive, replacing atomic concepts with logical forms. While intuitively, logical forms are more informative than atomic concepts, it is not clear how to quantify this improvement, and their evaluation is often based on the same metric that is optimized during the search-process and on the usage of hyper-parameters to be tuned. In this paper, we propose to use as evaluation metric the Detection Accuracy, which measures units' consistency of detection of their assigned explanations. We show that this metric (1) evaluates explanations of different lengths effectively, (2) can be used as a stopping criterion for the compositional explanation search, eliminating the explanation length hyper-parameter, and (3) exposes new specialized units whose length 1 explanations are the perceptual abstractions of their longer explanations.

【3】 Predicting the outcome of team movements -- Player time series analysis using fuzzy and deep methods for representation learning
标题：团队运动结果的预测--球员时间序列分析--基于模糊和深度表征学习方法的球员时间序列分析
链接：https://arxiv.org/abs/2109.07570

作者：Omid Shokrollahi,Bahman Rohani,Amin Nobakhti
机构： Amirkabir University, Computer engineering department ,Tehran, Iran, Amirkabir University, Computer engineering department Tehran, Iran, Sharif University of Technology, Electrical Engineering department Tehran, Telefax: +
备注：arXiv admin note: text overlap with arXiv:1901.10738 by other authors
摘要：我们提取并使用球员位置时间序列数据，连同动作类型一起标记，以建立一个表示团队战术行为模式的胜任模型，并使用此表示预测任意动作的结果。我们提供了一个框架，用于在更扩展的动作序列或战术计划中对短战术和空间占用进行有用的编码。我们调查了一场比赛中的比赛片段，在这场比赛中，持球的球队经常试图达到一个位置，在那里他们可以在一场比赛中射门。一个精心设计和有效的核是使用三角模糊隶属函数来创建多个时间序列的球员的潜力存在于不同的法院地区。然后使用三重损失和深度神经网络对时间序列进行无监督学习，并对导出的多元时间序列进行指数扩展因果卷积。这项工作的关键贡献在于其建模方法，即短场景如何影响其他长场景，以及玩家如何在游戏场地中占据和创造新空间。我们在2015-16赛季半期的职业篮球SportVU数据集上讨论了所提出的预测和识别任务方法的有效性。建议的系统即使在数据相对较小的情况下也能显示下降功能。
摘要：We extract and use player position time-series data, tagged along with the action types, to build a competent model for representing team tactics behavioral patterns and use this representation to predict the outcome of arbitrary movements. We provide a framework for the useful encoding of short tactics and space occupations in a more extended sequence of movements or tactical plans. We investigate game segments during a match in which the team in possession of the ball regularly attempts to reach a position where they can take a shot at goal for a single game. A carefully designed and efficient kernel is employed using a triangular fuzzy membership function to create multiple time series for players' potential of presence at different court regions. Unsupervised learning is then used for time series using triplet loss and deep neural networks with exponentially dilated causal convolutions for the derived multivariate time series. This works key contribution lies in its approach to model how short scenes contribute to other longer ones and how players occupies and creates new spaces in-game court. We discuss the effectiveness of the proposed approach for prediction and recognition tasks on the professional basketball SportVU dataset for the 2015-16 half-season. The proposed system demonstrates descent functionality even with relatively small data.

【4】 Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis
标题：学习聚合和提炼噪声标签以进行视觉情感分析
链接：https://arxiv.org/abs/2109.07509

作者：Wei Zhu,Zihe Zheng,Haitian Zheng,Hanjia Lyu,Jiebo Luo
机构：University of Rochester
摘要：视觉情感分析近年来受到越来越多的关注。然而，数据集的质量令人担忧，因为情绪标签是众包的、主观的，并且容易出错。这对数据驱动模型（包括深度神经网络）构成了严重威胁，如果训练深度神经网络使其过度拟合带有噪声情感标签的样本，则其在测试用例上的泛化能力较差。受带噪标签学习最新进展的启发，我们提出了一种鲁棒学习方法来执行鲁棒视觉情感分析。我们的方法在训练过程中依靠外部存储器来聚集和过滤噪声标签，从而可以防止模型过度拟合噪声情况。内存由带有相应标签的原型组成，两者都可以在线更新。我们使用公开的数据集建立了一个带有标签噪声的视觉情感分析基准。所提出的基准设置的实验结果全面证明了该方法的有效性。
摘要：Visual sentiment analysis has received increasing attention in recent years. However, the quality of the dataset is a concern because the sentiment labels are crowd-sourcing, subjective, and prone to mistakes. This poses a severe threat to the data-driven models including the deep neural networks which would generalize poorly on the testing cases if they are trained to over-fit the samples with noisy sentiment labels. Inspired by the recent progress on learning with noisy labels, we propose a robust learning method to perform robust visual sentiment analysis. Our method relies on an external memory to aggregate and filter noisy labels during training and thus can prevent the model from overfitting the noisy cases. The memory is composed of the prototypes with corresponding labels, both of which can be updated online. We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets. The experiment results of the proposed benchmark settings comprehensively show the effectiveness of our method.

检测相关(2篇)

【1】 Urdu text in natural scene images: a new dataset and preliminary text detection
标题：自然场景图像中的乌尔都语文本：一种新的数据集和初步文本检测
链接：https://arxiv.org/abs/2109.08060

作者：Hazrat Ali,Khalid Iqbal,Ghulam Mujtaba,Ahmad Fayyaz,Mohammad Farhad Bulbul,Fazal Wahab Karam,Ali Zahir
机构：Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan., Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Pakistan.
备注：None
摘要：用于内容分析的自然场景图像中的文本检测是一项有趣的任务。研究团体已经看到了英语/汉语文本检测的一些重大发展。然而，在自然场景图像中提取乌尔都语文本是一项尚未很好解决的任务。本文首先介绍了一种新的自然场景图像乌尔都语文本数据集。该数据集由从真实场景采集的500幅独立图像组成。其次，采用通道增强的最大稳定极值区域（MSER）方法提取图像中的乌尔都语文本区域作为候选区域。采用两级过滤机制消除非候选区域。在第一阶段，根据文本和噪声的几何特性对它们进行分类。在第二阶段，训练支持向量机分类器来丢弃非文本候选区域。然后，使用基于质心的垂直和水平距离链接文本候选区域。文本行由基于HOG特征的不同分类器进一步分析，以去除非文本区域。在本地开发的数据集上进行了大量实验，以评估性能。实验结果表明，在测试集图像上具有良好的性能。该数据集将可供研究使用。据我们所知，这项工作是乌尔都语的第一项同类工作，将为免费研究使用提供良好的数据集，并作为乌尔都语文本提取任务的基线性能。
摘要：Text detection in natural scene images for content analysis is an interesting task. The research community has seen some great developments for English/Mandarin text detection. However, Urdu text extraction in natural scene images is a task not well addressed. In this work, firstly, a new dataset is introduced for Urdu text in natural scene images. The dataset comprises of 500 standalone images acquired from real scenes. Secondly, the channel enhanced Maximally Stable Extremal Region (MSER) method is applied to extract Urdu text regions as candidates in an image. Two-stage filtering mechanism is applied to eliminate non-candidate regions. In the first stage, text and noise are classified based on their geometric properties. In the second stage, a support vector machine classifier is trained to discard non-text candidate regions. After this, text candidate regions are linked using centroid-based vertical and horizontal distances. Text lines are further analyzed by a different classifier based on HOG features to remove non-text regions. Extensive experimentation is performed on the locally developed dataset to evaluate the performance. The experimental results show good performance on test set images. The dataset will be made available for research use. To the best of our knowledge, the work is the first of its kind for the Urdu language and would provide a good dataset for free research use and serve as a baseline performance on the task of Urdu text extraction.

【2】 Detecting Propaganda Techniques in Memes
标题：检测模因中的宣传技术
链接：https://arxiv.org/abs/2109.08013

作者：Dimitar Dimitrov,Bishr Bin Ali,Shaden Shaar,Firoj Alam,Fabrizio Silvestri,Hamed Firooz,Preslav Nakov,Giovanni Da San Martino
机构： Sofia University “St. Kliment Ohridski”, Bulgaria, King’s College London, UK, Qatar Computing Research Institute, HBKU, Qatar, Sapienza University of Rome, Italy, Facebook AI, USA, University of Padova, Italy
备注：None
摘要：宣传可以被定义为一种传播形式，旨在影响人们对特定目标的意见或行动；这是通过定义良好的修辞和心理手段实现的。我们今天所知道的宣传形式可以追溯到17世纪初。然而，正是随着互联网和社交媒体的出现，它开始以比以前更大的规模传播，从而成为重大的社会和政治问题。如今，社交媒体中的大部分宣传是多模式的，混合了文本和视觉内容。有鉴于此，我们提出了一个新的多标签多模态任务：检测模因中使用的宣传技术类型。我们进一步创建并发布了一个包含950个模因的新语料库，用22种宣传技巧仔细注释，可以出现在文本、图像或两者中。我们对语料库的分析表明，共同理解这两种模式对于检测这些技术至关重要。这一点在我们使用几种最先进的多模态模型进行的实验中得到了进一步证实。
摘要：Propaganda can be defined as a form of communication that aims to influence the opinions or the actions of people towards a specific goal; this is achieved by means of well-defined rhetorical and psychological devices. Propaganda, in the form we know it today, can be dated back to the beginning of the 17th century. However, it is with the advent of the Internet and the social media that it has started to spread on a much larger scale than before, thus becoming major societal and political issue. Nowadays, a large fraction of propaganda in social media is multimodal, mixing textual with visual content. With this in mind, here we propose a new multi-label multimodal task: detecting the type of propaganda techniques used in memes. We further create and release a new corpus of 950 memes, carefully annotated with 22 propaganda techniques, which can appear in the text, in the image, or in both. Our analysis of the corpus shows that understanding both modalities together is essential for detecting these techniques. This is further confirmed in our experiments with several state-of-the-art multimodal models.

分类|识别(4篇)

【1】 Humanly Certifying Superhuman Classifiers
标题：人性化认证超人分类器
链接：https://arxiv.org/abs/2109.07867

作者：Qiongkai Xu,Christian Walder,Chenchen Xu
机构： The Australian National University, Canberra, ACT, Australia, Data, CSIRO, Canberra, ACT, Australia
摘要：估计机器学习系统的性能是人工智能研究中的一个长期挑战。今天，这一挑战尤其重要，因为出现了越来越优于人类的系统。在某些情况下，这种“超人”的表现很容易被证明；例如，在传统的双人游戏中击败传奇的人类玩家。另一方面，评估可能超过人类绩效的分类模型可能具有挑战性。事实上，人类注释通常被视为一个基本事实，它隐含地假设人类优于任何基于人类注释训练的模型。事实上，人类的注释者可能会犯错误，而且是主观的。评估与正版oracle相关的性能可能更客观、更可靠，即使查询oracle很昂贵或不可能。在本文中，我们首先提出了一个挑战，即对于一个未被观察到的oracle，评估人和模型的性能。我们发展了一种理论，与甲骨文相比，仅使用不完美的人类注释作为参考，来估计准确度。我们的分析提供了一个简单的方法来检测和证明这种环境下的超人表现，我们相信这将有助于理解当前分类研究的阶段。我们在精心设计的玩具实验中验证了边界的收敛性和我们理论的假设。此外，我们通过元分析大规模自然语言处理任务（不存在oracle）证明了我们的理论的实用性，并表明在我们的假设下，近年来的一些模型具有高概率超人。
摘要：Estimating the performance of a machine learning system is a longstanding challenge in artificial intelligence research. Today, this challenge is especially relevant given the emergence of systems which appear to increasingly outperform human beings. In some cases, this "superhuman" performance is readily demonstrated; for example by defeating legendary human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators can make mistakes and be subjective. Evaluating the performance with respect to a genuine oracle may be more objective and reliable, even when querying the oracle is expensive or impossible. In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is unobserved. We develop a theory for estimating the accuracy compared to the oracle, using only imperfect human annotations for reference. Our analysis provides a simple recipe for detecting and certifying superhuman performance in this setting, which we believe will assist in understanding the stage of current research on classification. We validate the convergence of the bounds and the assumptions of our theory on carefully designed toy experiments with known oracles. Moreover, we demonstrate the utility of our theory by meta-analyzing large-scale natural language processing tasks, for which an oracle does not exist, and show that under our assumptions a number of models from recent years are with high probability superhuman.

【2】 Building an Ensemble of Classifiers via Randomized Models of Ensemble Members
标题：利用集成成员的随机化模型构建分类器集成
链接：https://arxiv.org/abs/2109.07861

作者：Pawel Trajdos,Marek Kurzynski
机构：Department of Systems and Computer Networks, Wroc�law University of Science and Technology, Wybrze˙ze Wyspia´nskiego ,-, Wroc�law, Poland
摘要：文献中已知许多动态系综选择（DES）方法。作者先前开发的一种方法是建立一个随机分类器，作为基本分类器的模型。该模型在一定概率意义上等价于基本分类器。其次，将随机分类器正确分类的概率作为被评估分类器的能力。本文提出了一种新的随机基分类器模型。在该方法中，模型的随机操作是从固定大小的学习集族中随机选择学习集的结果。本文介绍了这种方法的数学基础，并展示了在给定学习集和验证集的实际应用中，如何确定能力的度量并使用DES方案构建MC系统。在收集67个基准数据集的基础上，通过实验评估了具有所提出能力模型的DES方案，并根据八个质量标准与使用先前提出的随机模型概念的两个集成分类器进行了比较。对于几乎所有调查的质量标准，建议的方法都达到了最低等级。
摘要：Many dynamic ensemble selection (DES) methods are known in the literature. A previously-developed by the authors, method consists in building a randomized classifier which is treated as a model of the base classifier. The model is equivalent to the base classifier in a certain probabilistic sense. Next, the probability of correct classification of randomized classifier is taken as the competence of the evaluated classifier. In this paper, a novel randomized model of base classifier is developed. In the proposed method, the random operation of the model results from a random selection of the learning set from the family of learning sets of a fixed size. The paper presents the mathematical foundations of this approach and shows how, for a practical application when learning and validation sets are given, one can determine the measure of competence and build a MC system with the DES scheme. The DES scheme with the proposed model of competence was experimentally evaluated on the collection of 67 benchmark datasets and compared in terms of eight quality criteria with two ensemble classifiers which use the previously-proposed concepts of randomized model. The proposed approach achieved the lowest ranks for almost all investigated quality criteria.

【3】 Soft Confusion Matrix Classifier for Stream Classification
标题：用于流分类的软念力矩阵分类器
链接：https://arxiv.org/abs/2109.07857

作者：Pawel Trajdos,Marek Kurzynski
机构：Wroclaw University of Science and Technology, Wroclaw, Poland
摘要：本文研究了如何裁剪基于软混淆矩阵（SCM）的分类器来处理流学习任务。这项工作的主要目标是开发一个包装分类器，允许对无法增量学习的分类器进行增量学习。该目标是通过对先前开发的SCM分类器进行两项改进来实现的。第一种方法旨在降低SCM分类器的计算成本。为此，改变了对象模糊邻域的定义。第二个目标是有效处理概念漂移。这是通过使用ADWIN驱动的概念漂移检测器实现的，该检测器不仅用于检测漂移，还用于控制邻域的大小。实验结果表明，该方法明显优于参考方法。
摘要：In this paper, the issue of tailoring the soft confusion matrix (SCM) based classifier to deal with stream learning task is addressed. The main goal of the work is to develop a wrapping-classifier that allows incremental learning to classifiers that are unable to learn incrementally. The goal is achieved by making two improvements in the previously developed SCM classifier. The first one is aimed at reducing the computational cost of the SCM classifier. To do so, the definition of the fuzzy neighborhood of an object is changed. The second one is aimed at effective dealing with the concept drift. This is done by employing the ADWIN-driven concept drift detector that is not only used to detect the drift but also to control the size of the neighbourhood. The obtained experimental results show that the proposed approach significantly outperforms the reference methods.

【4】 Probability-driven scoring functions in combining linear classifiers
标题：组合线性分类器中的概率驱动评分函数
链接：https://arxiv.org/abs/2109.07815

作者：Pawel Trajdos,Robert Burduk
机构：Department of Systems and Computer Networks, Wroclaw University of Science and, Technology, Wybrzeze Wyspianskiego ,-, Wroclaw, Poland
摘要：虽然线性分类器是机器学习中最古老的方法之一，但在机器学习领域仍然非常流行。这是因为它们的计算复杂度低，并且对过拟合具有鲁棒性。因此，线性分类器通常被用作多集成分类系统的基础分类器。本研究旨在建立一种新的线性分类器集成方法。融合方案同时使用测量空间和几何空间。即，我们提出了一个概率驱动的评分函数，其形状取决于基本分类器生成的决策超平面的方向。将提出的融合方法与使用多个基准数据集的参考方法进行了比较。使用多个质量标准进行比较。还对所得结果进行了统计分析。实验研究表明，在一定条件下，可以得到一些改进。
摘要：Although linear classifiers are one of the oldest methods in machine learning, they are still very popular in the machine learning community. This is due to their low computational complexity and robustness to overfitting. Consequently, linear classifiers are often used as base classifiers of multiple ensemble classification systems. This research is aimed at building a new fusion method dedicated to the ensemble of linear classifiers. The fusion scheme uses both measurement space and geometrical space. Namely, we proposed a probability-driven scoring function which shape depends on the orientation of the decision hyperplanes generated by the base classifiers. The proposed fusion method is compared with the reference method using multiple benchmark datasets taken from the KEEL repository. The comparison is done using multiple quality criteria. The statistical analysis of the obtained results is also performed. The experimental study shows that, under certain conditions, some improvement may be obtained.

表征(2篇)

【1】 ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations
标题：ObjectFolder：具有隐式视觉、听觉和触觉表示的对象的数据集
链接：https://arxiv.org/abs/2109.07991

作者：Ruohan Gao,Yen-Yu Chang,Shivani Mall,Li Fei-Fei,Jiajun Wu
机构：Stanford University
备注：In CoRL 2021
摘要：多传感器以对象为中心的感知、推理和交互是近年来的一个重要研究课题。然而，这些方向上的进展受到可用对象集的限制——合成对象不够逼真，大多以几何体为中心，而像YCB这样的真实对象数据集由于国际运输、库存和财务成本的原因，往往在获取方面具有挑战性和不稳定性。我们展示了ObjectFolder，这是一个包含100个虚拟化对象的数据集，它通过两个关键创新解决了这两个难题。首先，ObjectFolder对所有对象的视觉、听觉和触觉感官数据进行编码，实现了许多多感官对象识别任务，而不仅仅是专注于对象几何体的现有数据集。其次，ObjectFolder对每个对象的视觉纹理、声学模拟和触觉读数采用统一的、以对象为中心的隐式表示，使数据集使用灵活，易于共享。我们通过在各种基准任务（包括实例识别、跨感官检索、三维重建和机器人抓取）上对数据集进行评估，证明了数据集作为多传感器感知和控制测试平台的有用性。
摘要：Multisensory object-centric perception, reasoning, and interaction have been a key research topic in recent years. However, the progress in these directions is limited by the small set of objects available -- synthetic objects are not realistic enough and are mostly centered around geometry, while real object datasets such as YCB are often practically challenging and unstable to acquire due to international shipping, inventory, and financial cost. We present ObjectFolder, a dataset of 100 virtualized objects that addresses both challenges with two key innovations. First, ObjectFolder encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks, beyond existing datasets that focus purely on object geometry. Second, ObjectFolder employs a uniform, object-centric, and implicit representation for each object's visual textures, acoustic simulations, and tactile readings, making the dataset flexible to use and easy to share. We demonstrate the usefulness of our dataset as a testbed for multisensory perception and control by evaluating it on a variety of benchmark tasks, including instance recognition, cross-sensory retrieval, 3D reconstruction, and robotic grasping.

【2】 Network representation learning systematic review: ancestors and current development state
标题：网络表征学习的系统回顾：前人与发展现状
链接：https://arxiv.org/abs/2109.07583

作者：Amina Amara,Mohamed Ali Hadj Taieb,Mohamed Ben Aouicha
机构： University of Sfax
备注：None
摘要：现实世界中的信息网络越来越多地出现在各种学科中，包括在线社交网络和引用网络。这些网络数据通常具有稀疏性、非线性和异构性的特点，这给网络分析任务带来了不同的挑战，以获取网络数据的固有属性。人工智能和机器学习最近被用作从网络数据中学习见解和应对当前挑战的强大系统。作为机器学习技术的一部分，图嵌入方法最初是针对由特征表示的数据集（如图像数据集）构造的图而设计的，其中明确定义了节点之间的链接。这些传统方法无法应对网络数据挑战。作为一种新的学习范式，网络表征学习被提出来将真实世界的信息网络映射到低维空间，同时保留网络的固有属性。本文对网络表征学习（也称为网络嵌入）从诞生到目前的发展状况进行了系统全面的综述。通过开展的调查，我们全面了解了网络嵌入产生的原因，以及网络嵌入管道中使用的设置和模型类型。因此，我们简要介绍了网络嵌入的表征学习和词表征学习的历史。我们还提供了理解网络表示学习所需的基本概念的正式定义，然后描述了网络嵌入管道。最常用的评估嵌入的下游任务、它们的评估指标和流行的数据集都会突出显示。最后，我们介绍了用于网络嵌入的开源库。
摘要：Real-world information networks are increasingly occurring across various disciplines including online social networks and citation networks. These network data are generally characterized by sparseness, nonlinearity and heterogeneity bringing different challenges to the network analytics task to capture inherent properties from network data. Artificial intelligence and machine learning have been recently leveraged as powerful systems to learn insights from network data and deal with presented challenges. As part of machine learning techniques, graph embedding approaches are originally conceived for graphs constructed from feature represented datasets, like image dataset, in which links between nodes are explicitly defined. These traditional approaches cannot cope with network data challenges. As a new learning paradigm, network representation learning has been proposed to map a real-world information network into a low-dimensional space while preserving inherent properties of the network. In this paper, we present a systematic comprehensive survey of network representation learning, known also as network embedding, from birth to the current development state. Through the undertaken survey, we provide a comprehensive view of reasons behind the emergence of network embedding and, types of settings and models used in the network embedding pipeline. Thus, we introduce a brief history of representation learning and word representation learning ancestor of network embedding. We provide also formal definitions of basic concepts required to understand network representation learning followed by a description of network embedding pipeline. Most commonly used downstream tasks to evaluate embeddings, their evaluation metrics and popular datasets are highlighted. Finally, we present the open-source libraries for network embedding.

编码器(1篇)

【1】 Tied & Reduced RNN-T Decoder
标题：并列精简RNN-T解码器
链接：https://arxiv.org/abs/2109.07513

作者：Rami Botros,Tara N. Sainath,Robert David,Emmanuel Guzman,Wei Li,Yanzhang He
机构：Google Inc., U.S.A
备注：None
摘要：以往关于递归神经网络传感器（RNN-T）模型的工作表明，在某些条件下，可以简化其预测网络，而识别精度几乎没有损失（arXiv:2003.07705[eess.AS]，[2]，arXiv:2012.06749[cs.CL]）。这是通过限制以前标签的上下文大小和/或为其层而不是LSTM使用更简单的体系结构来实现的。这些变化的好处包括模型尺寸减小、推理速度加快和节能，这些都对设备应用非常有用。在这项工作中，我们研究了在不降低识别性能的情况下使RNN-T解码器（预测网络+联合网络）更小更快的方法。我们的预测网络对输入嵌入执行简单的加权平均，并与联合网络的输出层共享其嵌入矩阵权重（也称为权重绑定，常用于语言建模arXiv:1611.01462[cs.LG]）。这种简单的设计，当与额外的基于编辑的最小贝叶斯风险（EMR）训练结合使用时，将RNN-T解码器的参数从23M减少到仅2M，而不影响字错误率（WER）。
摘要：Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).

优化|敛散性(4篇)

【1】 A Quadratic Time Locally Optimal Algorithm for NP-hard Equal Cardinality Partition Optimization
标题：NP-Hard等基数划分优化的二次时间局部优化算法
链接：https://arxiv.org/abs/2109.07882

作者：Kaan Gokcesu,Hakan Gokcesu
摘要：我们研究了等基数集划分问题的优化版本（其中等大小划分的和之间的绝对差最小化）。虽然这个问题是NP难问题，通常需要指数复杂度来解决，但我们已经制定了这个NP难问题的较弱版本，其目标是找到局部最优解。在我们的工作中考虑的局部最优性是在对立分区的元素对之间的任何交换下。为此，我们设计了一个算法，可以在$O（N^2）$时间和$O（N）$空间中产生这样一个局部最优解。我们的方法不需要正输入或整数输入，并且在任意输入精度下同样有效。因此，它广泛适用于不同的问题场景。
摘要：We study the optimization version of the equal cardinality set partition problem (where the absolute difference between the equal sized partitions' sums are minimized). While this problem is NP-hard and requires exponential complexity to solve in general, we have formulated a weaker version of this NP-hard problem, where the goal is to find a locally optimal solution. The local optimality considered in our work is under any swap between the opposing partitions' element pairs. To this end, we designed an algorithm which can produce such a locally optimal solution in $O(N^2)$ time and $O(N)$ space. Our approach does not require positive or integer inputs and works equally well under arbitrary input precisions. Thus, it is widely applicable in different problem scenarios.

【2】 Optimal Probing with Statistical Guarantees for Network Monitoring at Scale
标题：基于统计保证的大规模网络监测最优探测
链接：https://arxiv.org/abs/2109.07743

作者：Muhammad Jehangir Amjad,Christophe Diot,Dimitris Konomis,Branislav Kveton,Augustin Soule,Xiaolong Yang
机构： Google, MIT, Amazon
摘要：云网络很难监控，因为它们增长迅速，监控它们的预算有限。我们提出了一个用于估计网络指标（如延迟和数据包丢失）的框架，并保证在固定的监控预算下估计错误。我们提出的算法在网络路径上产生探测分布，然后对其进行监控；并基于统计学中的A-和E-最优实验设计。不幸的是，这些设计的计算成本太高，无法在生产规模上使用。我们提出了基于Frank-Wolfe算法的可伸缩近似和近似最优近似。我们在真实网络拓扑的模拟中验证了我们的方法，并在真实的云网络中使用了生产探测系统。与生产和学术基线相比，我们在降低探测预算方面取得了重大进展，同时保持了较低的估计误差，即使探测预算非常低。
摘要：Cloud networks are difficult to monitor because they grow rapidly and the budgets for monitoring them are limited. We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget. Our proposed algorithms produce a distribution of probes across network paths, which we then monitor; and are based on A- and E-optimal experimental designs in statistics. Unfortunately, these designs are too computationally costly to use at production scale. We propose their scalable and near-optimal approximations based on the Frank-Wolfe algorithm. We validate our approaches in simulation on real network topologies, and also using a production probing system in a real cloud network. We show major gains in reducing the probing budget compared to both production and academic baselines, while maintaining low estimation errors, even with very low probing budgets.

【3】 Adversarially Regularized Policy Learning Guided by Trajectory Optimization
标题：轨迹优化引导的对抗性正则化策略学习
链接：https://arxiv.org/abs/2109.07627

作者：Zhigen Zhao,Simiao Zuo,Tuo Zhao,Ye Zhao
机构：School of Mechanical Engineering, Georgia Institute of Technology†School of Industrial and Systems Engineering, Georgia Institute of Technology‡Co-corresponding Author 1arXiv
摘要：将轨迹优化与函数逼近（特别是神经网络）相结合的最新进展显示了在机器人系统中学习复杂控制策略的前景。尽管具有很大的灵活性，但用于参数化控制策略的大型神经网络带来了重大挑战。学习到的神经控制策略往往过于复杂和不平滑，这很容易导致意外或发散的机器人运动。因此，在实际应用中，它们的泛化性能往往很差。为了解决这个问题，我们提出了一种基于轨迹优化的逆向正则化策略学习（VERONICA）来学习平滑控制策略。具体地说，我们提出的方法通过稳定输出控制对输入状态的最坏情况扰动来控制神经控制策略的平滑度（局部Lipschitz连续性）。我们在机器人操作上的实验表明，我们提出的方法不仅提高了神经策略学习的样本效率，而且增强了策略对各种干扰（包括传感器噪声、环境不确定性和模型失配）的鲁棒性。
摘要：Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems. Despite their great flexibility, the large neural networks for parameterizing control policies impose significant challenges. The learned neural control policies are often overcomplex and non-smooth, which can easily cause unexpected or diverging robot motions. Therefore, they often yield poor generalization performance in practice. To address this issue, we propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies. Specifically, our proposed approach controls the smoothness (local Lipschitz continuity) of the neural control policies by stabilizing the output control with respect to the worst-case perturbation to the input state. Our experiments on robot manipulation show that our proposed approach not only improves the sample efficiency of neural policy learning but also enhances the robustness of the policy against various types of disturbances, including sensor noise, environmental uncertainty, and model mismatch.

【4】 Non-smooth Bayesian Optimization in Tuning Problems
标题：调谐问题中的非光滑贝叶斯优化
链接：https://arxiv.org/abs/2109.07563

作者：Hengrui Luo,James W. Demmel,Younghyun Cho,Xiaoye S. Li,Yang Liu
机构：Lawrence Berkeley National Laboratory, Computational Research Department, Cyclotron Rd, Berkeley, CA , USA, University of California, Berkeley, Department of EECS, Cory Hall, Berkeley, CA , USA, Editor:
备注：61 pages
摘要：当我们试图学习未知的黑盒函数时，构建代理模型是一种常见的方法。贝叶斯优化提供了一个框架，它允许我们基于从函数中提取的顺序样本构建代理模型，并找到最优值。调整算法参数以优化大型复杂“黑箱”应用程序代码的性能是一个特定的重要应用，其目的是找到黑箱函数的最优值。在贝叶斯优化框架内，高斯过程模型产生平滑或连续的样本路径。然而，调整问题中的黑盒函数通常是非光滑的。由于我们通常从黑箱函数中获得有限的顺序样本，这一困难的调优问题变得更加严重。基于调谐过程中遇到的这些问题，我们提出了一种新的加性高斯过程模型，称为聚类高斯过程（cGP），其中加性成分由聚类产生。在我们所研究的例子中，在重复实验中，性能可以提高高达90%。通过使用此代理模型，我们希望捕获黑盒函数的非光滑性。除了构造该模型的算法外，我们还将该模型应用于若干人工和实际应用中，以对其进行评估。
摘要：Building surrogate models is one common approach when we attempt to learn unknown black-box functions. Bayesian optimization provides a framework which allows us to build surrogate models based on sequential samples drawn from the function and find the optimum. Tuning algorithmic parameters to optimize the performance of large, complicated "black-box" application codes is a specific important application, which aims at finding the optima of black-box functions. Within the Bayesian optimization framework, the Gaussian process model produces smooth or continuous sample paths. However, the black-box function in the tuning problem is often non-smooth. This difficult tuning problem is worsened by the fact that we usually have limited sequential samples from the black-box function. Motivated by these issues encountered in tuning, we propose a novel additive Gaussian process model called clustered Gaussian process (cGP), where the additive components are induced by clustering. In the examples we studied, the performance can be improved by as much as 90% among repetitive experiments. By using this surrogate model, we want to capture the non-smoothness of the black-box function. In addition to an algorithm for constructing this model, we also apply the model to several artificial and real applications to evaluate it.

预测|估计(10篇)

【1】 Associative Memories via Predictive Coding
标题：基于预测编码的联想存储器
链接：https://arxiv.org/abs/2109.08063

作者：Tommaso Salvatori,Yuhang Song,Yujian Hong,Simon Frieder,Lei Sha,Zhenghua Xu,Rafal Bogacz,Thomas Lukasiewicz
机构：Department of Computer Science, University of Oxford, UK, University of Oxford, Oxford, UK, State Key Laboratory, Hebei University of Technology, Tianjin, China, MRC Brain Network Dynamics Unit
备注：24 pages, 18 figures
摘要：大脑中的联想记忆接收并存储由感觉神经元记录的活动模式，并能在必要时检索它们。由于联想记忆在人类智能中的重要性，联想记忆的计算模型已经发展了几十年。它们包括自联想存储器，允许存储数据点并在提供有噪声或部分变量$s$时检索存储的数据点$s$，以及能够存储和调用多模态数据的异联想存储器。在本文中，我们提出了一种新的实现联想记忆的神经模型，基于通过感觉神经元接收外部刺激的分层生成网络。该模型使用预测编码（predictivecoding）进行训练，这是一种受大脑皮层信息处理启发的基于错误的学习算法。为了测试该模型的功能，我们从损坏和不完整的数据点执行了多个检索实验。在广泛的比较中，我们发现这种新模型在检索精度和鲁棒性方面优于流行的联想记忆模型，如通过反向传播训练的自动编码器和现代Hopfield网络。特别是，在完成部分数据点时，我们的模型在自然图像数据集（如ImageNet）上取得了令人惊讶的高精度结果，即使仅显示原始图像的一小部分像素。此外，我们还证明了该方法能够处理多模态数据，从描述中检索图像，反之亦然。最后，我们讨论了这项工作在神经科学界可能产生的影响，表明我们的模型为研究大脑中记忆的学习和提取提供了一个合理的框架，因为它紧密地模仿了海马体作为记忆指数和生成模型的行为。
摘要：Associative memories in the brain receive and store patterns of activity registered by the sensory neurons, and are able to retrieve them when necessary. Due to their importance in human intelligence, computational models of associative memories have been developed for several decades now. They include autoassociative memories, which allow for storing data points and retrieving a stored data point $s$ when provided with a noisy or partial variant of $s$, and heteroassociative memories, able to store and recall multi-modal data. In this paper, we present a novel neural model for realizing associative memories, based on a hierarchical generative network that receives external stimuli via sensory neurons. This model is trained using predictive coding, an error-based learning algorithm inspired by information processing in the cortex. To test the capabilities of this model, we perform multiple retrieval experiments from both corrupted and incomplete data points. In an extensive comparison, we show that this new model outperforms in retrieval accuracy and robustness popular associative memory models, such as autoencoders trained via backpropagation, and modern Hopfield networks. In particular, in completing partial data points, our model achieves remarkable results on natural image datasets, such as ImageNet, with a surprisingly high accuracy, even when only a tiny fraction of pixels of the original images is presented. Furthermore, we show that this method is able to handle multi-modal data, retrieving images from descriptions, and vice versa. We conclude by discussing the possible impact of this work in the neuroscience community, by showing that our model provides a plausible framework to study learning and retrieval of memories in the brain, as it closely mimics the behavior of the hippocampus as a memory index and generative model.

【2】 A Machine Learning Framework for Automatic Prediction of Human Semen Motility
标题：一种用于人类精液运动自动预测的机器学习框架
链接：https://arxiv.org/abs/2109.08049

作者：Sandra Ottl,Maurice Gerczuk,Shahin Amiriparian,Björn Schuller
机构：predicted motility, BoW, feature aggregation, custom, movement, statistics, mean, squared, displacement, extracted features, raw video, tracked sperm, machine learning, arXiv:,.,v, [cs.LG] , Sep
摘要：在生殖健康领域，检测男性生育问题的一个重要方面是分析人类精液质量。两个重要因素是精子细胞的形态和活力。前者描述精子不同部位的缺陷，后者测量细胞的有效运动。对于许多非人类物种来说，所谓的计算机辅助精子分析系统能够很好地从显微镜下的视频记录中评估这些特征，但人类精子样本通常显示出较高程度的碎片和死亡精子，以及较低的整体精子活率。在这里，利用大量训练数据提取显著特征的机器学习方法可以帮助医生检测生育问题或体外受精程序。在这项工作中，通过结合无监督特征提取方法和下游回归模型的机器学习框架，预测给定精子样本的整体运动能力。本文评估的模型改进了基于视频的精子运动预测的最新技术。
摘要：In the field of reproductive health, a vital aspect for the detection of male fertility issues is the analysis of human semen quality. Two factors of importance are the morphology and motility of the sperm cells. While the former describes defects in different parts of a spermatozoon, the latter measures the efficient movement of cells. For many non-human species, so-called Computer-Aided Sperm Analysis systems work well for assessing these characteristics from microscopic video recordings but struggle with human sperm samples which generally show higher degrees of debris and dead spermatozoa, as well as lower overall sperm motility. Here, machine learning methods that harness large amounts of training data to extract salient features could support physicians with the detection of fertility issues or in vitro fertilisation procedures. In this work, the overall motility of given sperm samples is predicted with the help of a machine learning framework integrating unsupervised methods for feature extraction with downstream regression models. The models evaluated herein improve on the state-of-the-art for video-based sperm-motility prediction.

【3】 Predicting Users' Value Changes by the Friends' Influence from Social Media Usage
标题：利用好友对社交媒体使用的影响预测用户价值变化
链接：https://arxiv.org/abs/2109.08021

作者：Md. Saddam Hossain Mukta,Ahmed Shahriar Sakib,Md. Adnanul Islam,Mohiuddin Ahmed,Mumshad Ahamed Rifat
机构： United International University, Dhaka, Bangladesh, American International University Bangladesh, Military Institute of Science and Technology, Edith Cowan University, Australia
备注：None
摘要：人类的基本价值观代表了一系列价值观，如安全、独立、成功、善良和快乐，我们认为这些价值观对我们的生活很重要。我们每个人都持有不同的价值观，具有不同程度的重要性。现有研究表明，一个人的价值观可以从他们的社交网络使用情况中识别出来。然而，由于生活经历、影响、社会结构和技术等不同因素，一个人的价值优先权可能会随着时间的推移而变化。现有的研究没有对用户价值观从社会影响（即群体说服）到社交媒体使用的变化进行任何分析。在我们的研究中，首先，我们通过朋友对社交媒体使用的影响来预测用户的价值分数。我们从Facebook上275个不同的自我网络中提出了一个基于有限信心模型（BCM）的价值动力学模型，该模型预测了社会影响如何说服一个人随时间改变他们的价值。然后，为了更好地预测，我们使用了基于粒子群优化的超参数调整技术。我们观察到，这些优化的超参数产生准确的未来价值分数。我们还使用不同的基于机器学习的方法运行我们的方法，发现支持向量回归（SVR）优于其他回归模型。通过使用具有最佳BCM模型超参数的SVR，我们发现最小均方误差（MSE）得分为0.00347。
摘要：Basic human values represent a set of values such as security, independence, success, kindness, and pleasure, which we deem important to our lives. Each of us holds different values with different degrees of significance. Existing studies show that values of a person can be identified from their social network usage. However, the value priority of a person may change over time due to different factors such as life experiences, influence, social structure and technology. Existing studies do not conduct any analysis regarding the change of users' value from the social influence, i.e., group persuasion, form the social media usage. In our research, first, we predict users' value score by the influence of friends from their social media usage. We propose a Bounded Confidence Model (BCM) based value dynamics model from 275 different ego networks in Facebook that predicts how social influence may persuade a person to change their value over time. Then, to predict better, we use particle swarm optimization based hyperparameter tuning technique. We observe that these optimized hyperparameters produce accurate future value score. We also run our approach with different machine learning based methods and find support vector regression (SVR) outperforms other regressor models. By using SVR with the best hyperparameters of BCM model, we find the lowest Mean Squared Error (MSE) score 0.00347.

【4】 SAFRAN: An interpretable, rule-based link prediction method outperforming embedding models
标题：SAFRAN：一种性能优于嵌入模型的可解释的、基于规则的链接预测方法
链接：https://arxiv.org/abs/2109.08002

作者：Simon Ott,Christian Meilicke,Matthias Samwald
机构：Institute of Artificial Intelligence and Decision Support, Medical University of Vienna, Austria, Data and Web Science Research Group, University Mannheim, Germany
摘要：基于神经嵌入的机器学习模型在预测知识图中的新链接方面表现出了良好的前景。不幸的是，它们的实用性由于缺乏可解释性而被削弱。最近，完全可解释的、基于规则的算法AnyBURL在许多通用链路预测基准上产生了极具竞争力的结果。然而，当前聚合多个规则所做预测的方法受到冗余的影响。我们通过引入SAFRAN规则应用框架对AnyBURL进行了改进，该框架使用了一种称为非冗余噪声的新聚合方法，或者在聚合之前检测并聚类冗余规则。SAFRAN在已建立的通用基准FB15K-237、WN18RR和YAGO3-10上为完全可解释链路预测提供了最新的结果。此外，它超越了基于FB15K-237和WN18RR的多个已建立的基于嵌入的算法的结果，缩小了基于YAGO3-10的基于规则和基于嵌入的算法之间的差距。
摘要：Neural embedding-based machine learning models have shown promise for predicting novel links in knowledge graphs. Unfortunately, their practical utility is diminished by their lack of interpretability. Recently, the fully interpretable, rule-based algorithm AnyBURL yielded highly competitive results on many general-purpose link prediction benchmarks. However, current approaches for aggregating predictions made by multiple rules are affected by redundancies. We improve upon AnyBURL by introducing the SAFRAN rule application framework, which uses a novel aggregation approach called Non-redundant Noisy-OR that detects and clusters redundant rules prior to aggregation. SAFRAN yields new state-of-the-art results for fully interpretable link prediction on the established general-purpose benchmarks FB15K-237, WN18RR and YAGO3-10. Furthermore, it exceeds the results of multiple established embedding-based algorithms on FB15K-237 and WN18RR and narrows the gap between rule-based and embedding-based algorithms on YAGO3-10.

【5】 Auditing Fairness and Imputation Impact in Predictive Analytics for Higher Education
标题：高等教育预测分析中的审计公正性与归责影响
链接：https://arxiv.org/abs/2109.07908

作者：Hadis Anahideh,Nazanin Nezami,Denisa G`andara
机构： University of Illinois at Chicago, The University of Texas at Austin
摘要：如今，学院和大学以各种方式使用预测分析来提高学生的成功率。尽管预测分析具有潜力，但在高等教育中采用预测分析存在两大障碍：（a）部署方面缺乏民主化；（b）可能加剧不平等。教育研究者和决策者在实践中部署预测模型时遇到了许多挑战。这些挑战存在于建模的不同步骤中，包括数据准备、模型开发和评估。然而，如果执行不当，这些步骤中的每一个都会给系统带来额外的偏差。大多数具有全国代表性的大规模教育数据集都受到研究参与者大量不完整回答的影响。缺失值是许多数据分析挑战背后的常见潜在原因。虽然许多与教育相关的研究解决了数据缺失的挑战，但对于处理缺失值对实践中预测结果公平性的影响知之甚少。在本文中，我们首先评估大学生成功预测模型结果的差异，然后使用一套综合的通用指标研究插补技术对模型性能和公平性的影响。对真实大规模教育数据集的综合分析揭示了关于建模差异的关键见解，以及不同插补技术在对学生成功预测结果公平性的影响方面如何相互进行根本性比较。
摘要：Nowadays, colleges and universities use predictive analytics in a variety of ways to increase student success rates. Despite the potentials for predictive analytics, there exist two major barriers to their adoption in higher education: (a) the lack of democratization in deployment, and (b) the potential to exacerbate inequalities. Education researchers and policymakers encounter numerous challenges in deploying predictive modeling in practice. These challenges present in different steps of modeling including data preparation, model development, and evaluation. Nevertheless, each of these steps can introduce additional bias to the system if not appropriately performed. Most large-scale and nationally representative education data sets suffer from a significant number of incomplete responses from the research participants. Missing Values are the frequent latent causes behind many data analysis challenges. While many education-related studies addressed the challenges of missing data, little is known about the impact of handling missing values on the fairness of predictive outcomes in practice. In this paper, we set out to first assess the disparities in predictive modeling outcome for college-student success, then investigate the impact of imputation techniques on the model performance and fairness using a comprehensive set of common metrics. The comprehensive analysis of a real large-scale education dataset reveals key insights on the modeling disparity and how different imputation techniques fundamentally compare to one another in terms of their impact on the fairness of the student-success predictive outcome.

【6】 Predicting students' performance in online courses using multiple data sources
标题：使用多数据源预测学生在网络课程中的表现
链接：https://arxiv.org/abs/2109.07903

作者：Mélina Verger,Hugo Jair Escalante
机构： using several data sources en-ables to discover relationships among different factors and 1Paris-Saclay University
摘要：数据驱动的决策服务于教育并改变教育。我们通过使用来自在线课程的多个数据源（包括我们创建的一个）来解决预测学生成绩的问题。实验结果显示了任务需要考虑的数据的初步结论。
摘要：Data-driven decision making is serving and transforming education. We approached the problem of predicting students' performance by using multiple data sources which came from online courses, including one we created. Experimental results show preliminary conclusions towards which data are to be considered for the task.

【7】 Incentives in Two-sided Matching Markets with Prediction-enhanced Preference-formation
标题：具有预测增强偏好形成的双边匹配市场中的激励
链接：https://arxiv.org/abs/2109.07835

作者：Stefania Ionescu,Yuhao Du,Kenneth Joseph,Anikó Hannák
机构： University of Z¨urich, University at Buffalo
摘要：在缺乏监管交易所的情况下，双边配对市场长期存在，以配对代理人。一个常见的例子是学校选择，匹配机制使用学生和学校偏好将学生分配到学校。在这种情况下，形成偏好既困难又关键。先前的工作已经提出了各种各样的预测机制来帮助代理做出关于他们偏好的决定。尽管这些匹配和预测机制通常一起部署，但它们几乎总是单独分析的。目前的研究表明，在这两者的交叉点存在着一种以前未被探索过的战略行为：回到市场（如学校）的代理人可以通过与他们的对手进行短期非最佳互动来攻击未来预测。在这里，我们首先介绍这种类型的战略行为，我们称之为“对抗性交互攻击”。接下来，我们构建了一个正式的经济模型，该模型捕获了用于辅助代理的预测机制和用于配对代理的匹配机制之间的反馈回路。该经济模型允许我们分析对抗性交互攻击。最后，以学校选择为例，我们构建了一个模拟，表明随着对预测的信任度和准确性的增加，学校通过发起对抗性交互攻击获得的收益越来越多。我们还表明，这种攻击加剧了学生群体中的不平等。
摘要：Two-sided matching markets have long existed to pair agents in the absence of regulated exchanges. A common example is school choice, where a matching mechanism uses student and school preferences to assign students to schools. In such settings, forming preferences is both difficult and critical. Prior work has suggested various prediction mechanisms that help agents make decisions about their preferences. Although often deployed together, these matching and prediction mechanisms are almost always analyzed separately. The present work shows that at the intersection of the two lies a previously unexplored type of strategic behavior: agents returning to the market (e.g., schools) can attack future predictions by interacting short-term non-optimally with their matches. Here, we first introduce this type of strategic behavior, which we call an `adversarial interaction attack'. Next, we construct a formal economic model that captures the feedback loop between prediction mechanisms designed to assist agents and the matching mechanism used to pair them. This economic model allows us to analyze adversarial interaction attacks. Finally, using school choice as an example, we build a simulation to show that, as the trust in and accuracy of predictions increases, schools gain progressively more by initiating an adversarial interaction attack. We also show that this attack increases inequality in the student population.

【8】 A Comparative Study of Machine Learning Methods for Predicting the Evolution of Brain Connectivity from a Baseline Timepoint
标题：从基线时间点预测大脑连通性演变的机器学习方法的比较研究
链接：https://arxiv.org/abs/2109.07739

作者：Şeymanur Aktı,Doğay Kamar,Özgür Anıl Özlü,Ihsan Soydemir,Muhammet Akcan,Abdullah Kul,Islem Rekik
机构：BASIRA lab, Istanbul Technical University, Istanbul, Turkey, School of Science and Engineering, Computing, University of Dundee, UK
摘要：通过预测连接成对解剖区域的连接权重的变化，预测大脑网络（也称为连接组）的进化，可以在早期发现连接相关的神经系统疾病，并检测潜在连接组异常的发展。值得注意的是，这种具有挑战性的预测问题在预测性连接组学文献中仍然是探索最少的。众所周知，机器学习（ML）方法已在各种计算机视觉问题中证明了其预测能力。然而，专门用于从单个时间点预测大脑连接性进化轨迹的ML技术几乎不存在。为了填补这一空白，我们组织了一个Kaggle竞赛，其中20个参赛团队设计了先进的机器学习管道，用于从单个时间点预测大脑连通性的进化。竞争团队结合数据预处理、降维和学习方法开发了他们的ML管道。利用包容性评估方法，我们根据两个互补的评估指标（平均绝对误差（MAE）和皮尔逊相关系数（PCC））对方法进行排名，并使用不同的训练和测试数据扰动策略（单随机分割和交叉验证）对其性能进行排名。最终排名是使用所有评估措施和验证策略中每个竞争团队的排名产品计算的。为了支持开放科学，GitHub上提供了开发的20 ML管道以及connectomic数据集。本次竞赛的结果预计将导致预测模型的进一步发展，该模型可以预测大脑连接性随时间的演变，以及其他类型的网络（如遗传网络）。
摘要：Predicting the evolution of the brain network, also called connectome, by foreseeing changes in the connectivity weights linking pairs of anatomical regions makes it possible to spot connectivity-related neurological disorders in earlier stages and detect the development of potential connectomic anomalies. Remarkably, such a challenging prediction problem remains least explored in the predictive connectomics literature. It is a known fact that machine learning (ML) methods have proven their predictive abilities in a wide variety of computer vision problems. However, ML techniques specifically tailored for the prediction of brain connectivity evolution trajectory from a single timepoint are almost absent. To fill this gap, we organized a Kaggle competition where 20 competing teams designed advanced machine learning pipelines for predicting the brain connectivity evolution from a single timepoint. The competing teams developed their ML pipelines with a combination of data pre-processing, dimensionality reduction, and learning methods. Utilizing an inclusive evaluation approach, we ranked the methods based on two complementary evaluation metrics (mean absolute error (MAE) and Pearson Correlation Coefficient (PCC)) and their performances using different training and testing data perturbation strategies (single random split and cross-validation). The final rank was calculated using the rank product for each competing team across all evaluation measures and validation strategies. In support of open science, the developed 20 ML pipelines along with the connectomic dataset are made available on GitHub. The outcomes of this competition are anticipated to lead to the further development of predictive models that can foresee the evolution of brain connectivity over time, as well as other types of networks (e.g., genetic networks).

【9】 Data-Driven Theory-guided Learning of Partial Differential Equations using SimultaNeous Basis Function Approximation and Parameter Estimation (SNAPE)
标题：利用同时基函数逼近和参数估计(SNAPE)的数据驱动理论指导的偏微分方程学习
链接：https://arxiv.org/abs/2109.07471

作者：Sutanu Bhowmick,Satish Nagarajaiah
机构：Department of Civil and Environmental Engineering, Rice University, Houston, TX , Department of Mechanical Engineering
备注：24 pages, 14 figures, Submitted to Science Advances
摘要：利用测量的各种物理过程的时空响应来推断控制偏微分方程（PDE）。我们提出了同步基函数逼近和参数估计（SNAPE），这是一种PDE的参数估计技术，通过同时将基函数拟合到测量响应并估计常微分方程和偏微分方程的参数，对高噪声水平具有近100%的鲁棒性。一般多维过程的领域知识被用作优化框架制定中的约束。斯内普不仅证明了它在各种复杂动态系统上的适用性，这些系统涵盖了广泛的科学领域，包括Schr\odinger方程、混沌duffing振子和Navier-Stokes方程，但也估算了过程响应的解析近似值。该方法系统地结合了成熟科学理论的知识和数据科学的概念，从观测数据推断过程的性质。
摘要：The measured spatiotemporal response of various physical processes is utilized to infer the governing partial differential equations (PDEs). We propose SimultaNeous Basis Function Approximation and Parameter Estimation (SNAPE), a technique of parameter estimation of PDEs that is robust against high levels of noise nearly 100 %, by simultaneously fitting basis functions to the measured response and estimating the parameters of both ordinary and partial differential equations. The domain knowledge of the general multidimensional process is used as a constraint in the formulation of the optimization framework. SNAPE not only demonstrates its applicability on various complex dynamic systems that encompass wide scientific domains including Schr\"odinger equation, chaotic duffing oscillator, and Navier-Stokes equation but also estimates an analytical approximation to the process response. The method systematically combines the knowledge of well-established scientific theories and the concepts of data science to infer the properties of the process from the observed data.

【10】 DeepMTS: Deep Multi-task Learning for Survival Prediction in Patients with Advanced Nasopharyngeal Carcinoma using Pretreatment PET/CT
标题：DeepMTS：治疗前PET/CT用于晚期鼻咽癌患者生存预测的深度多任务学习
链接：https://arxiv.org/abs/2109.07711

作者：Mingyuan Meng,Bingxin Gu,Lei Bi,Shaoli Song,David Dagan Feng,Jinman Kim
机构：a School of Computer Science, the University of Sydney, Sydney, Australia., b Department of Nuclear Medicine, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical
备注：Under Review
摘要：鼻咽癌是一种世界性的恶性上皮癌。生存预测是鼻咽癌患者的主要关注点，因为它提供了指导治疗所需的早期预后信息。最近，利用深度神经网络（DNN）来学习图像模式的深度表示的深度学习已经被引入到包括鼻咽癌在内的各种癌症的生存预测中。据报道，图像衍生的端到端深部生存模型在预后表现方面可能优于临床预后指标和传统的基于放射组学的生存模型。然而，深层生存模型，尤其是3D模型，需要大量的图像训练数据来避免过度拟合。不幸的是，由于PET/CT扫描的高成本，医学图像数据通常很少，尤其是正电子发射断层扫描/计算机断层扫描（PET/CT）。与仅提供肿瘤解剖信息的磁共振成像（MRI）或计算机断层扫描（CT）相比，同时提供解剖（来自CT）和代谢（来自PET）信息的PET/CT有望实现更准确的生存预测。然而，我们还没有发现任何适用于鼻咽癌患者小PET/CT数据的3D端到端深部生存模型。在本研究中，我们将多任务学习的概念引入到深度生存模型中，以解决小数据导致的过度拟合问题。将肿瘤分割作为辅助任务，以提高模型从稀少PET/CT数据中学习的效率。基于这一思想，我们提出了一种用于联合生存预测和肿瘤分割的三维端到端深度多任务生存模型（DeepMTS）。我们的DeepMTS可以使用170例晚期鼻咽癌患者的PET/CT数据联合学习生存预测和肿瘤分割。
摘要：Nasopharyngeal Carcinoma (NPC) is a worldwide malignant epithelial cancer. Survival prediction is a major concern for NPC patients, as it provides early prognostic information that is needed to guide treatments. Recently, deep learning, which leverages Deep Neural Networks (DNNs) to learn deep representations of image patterns, has been introduced to the survival prediction in various cancers including NPC. It has been reported that image-derived end-to-end deep survival models have the potential to outperform clinical prognostic indicators and traditional radiomics-based survival models in prognostic performance. However, deep survival models, especially 3D models, require large image training data to avoid overfitting. Unfortunately, medical image data is usually scarce, especially for Positron Emission Tomography/Computed Tomography (PET/CT) due to the high cost of PET/CT scanning. Compared to Magnetic Resonance Imaging (MRI) or Computed Tomography (CT) providing only anatomical information of tumors, PET/CT that provides both anatomical (from CT) and metabolic (from PET) information is promising to achieve more accurate survival prediction. However, we have not identified any 3D end-to-end deep survival model that applies to small PET/CT data of NPC patients. In this study, we introduced the concept of multi-task leaning into deep survival models to address the overfitting problem resulted from small data. Tumor segmentation was incorporated as an auxiliary task to enhance the model's efficiency of learning from scarce PET/CT data. Based on this idea, we proposed a 3D end-to-end Deep Multi-Task Survival model (DeepMTS) for joint survival prediction and tumor segmentation. Our DeepMTS can jointly learn survival prediction and tumor segmentation using PET/CT data of only 170 patients with advanced NPC.

其他神经网络|深度学习|模型|建模(11篇)

【1】 Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?
标题：研究机器学习数据：当我们指的是权力时，为什么要谈论偏见？
链接：https://arxiv.org/abs/2109.08131

作者：Milagros Miceli,Julian Posada,Tianling Yang
机构： Technische Universität Berlin, Weizenbaum Institute, University of Toronto, Schwartz Reisman Institute, Weizen-baum Institute
备注：Accepted at ACM Group 2022. Forthcoming on Proceedings of the ACM on Human-Computer Interaction
摘要：机器学习（ML）的研究主要认为，在不完整或有偏差的数据集上训练的模型可能导致歧视性输出。在这篇评论中，我们建议通过采用一种权力意识的视角来“研究”ML数据集，从而将研究重点从偏倚导向的框架转移出去。这意味着要考虑历史上的不平等、劳动条件和数据中所包含的认识论观点。我们利用HCI和CSCW的工作来支持我们的论点，批判性地分析先前的研究，并指出我们社区中两条共存的工作路线——一条是以偏见为导向的，另一条是以权力为导向的。通过这种方式，我们强调了在三个领域开展对话与合作的必要性：数据质量、数据工作和数据文档。在第一个方面，我们认为将社会问题简化为“偏见”会忽略数据基于上下文的本质。在第二篇文章中，我们强调了参与数据工作者劳动的企业力量和市场需求，这些数据工作者随后形成了ML数据集。最后，我们建议在数据集文档中扩展当前以透明度为导向的工作，以反映数据设计和生产的社会背景。
摘要：Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI and CSCW work to support our argument, critically analyze previous research, and point at two co-existing lines of work within our community -- one bias-oriented, the other power-aware. This way, we highlight the need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the first area, we argue that reducing societal problems to "bias" misses the context-based nature of data. In the second one, we highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in dataset documentation to reflect the social contexts of data design and production.

【2】 TruthfulQA: Measuring How Models Mimic Human Falsehoods
标题：真相问答：测量模型如何模仿人类的错误
链接：https://arxiv.org/abs/2109.07958

作者：Stephanie Lin,Jacob Hilton,Owain Evans
备注：The TruthfulQA benchmark and evaluation code is available at this https URL
摘要：我们提出了一个基准来衡量语言模型在生成问题答案时是否真实。该基准包括817个问题，涉及38个类别，包括卫生、法律、金融和政治。我们精心设计了一些人类会因为错误的信仰或误解而错误回答的问题。为了表现良好，模型必须避免从模仿人类文本中获得错误答案。我们测试了GPT-3、GPT Neo/J、GPT-2和基于T5的模型。最佳模型在58%的问题上是真实的，而人的表现是94%。模型产生了许多模仿流行误解的错误答案，并有可能欺骗人类。最大的模型通常最不真实。例如，6B参数GPT-J模型的真实性比125M参数对应模型低17%。这与其他NLP任务不同，NLP任务的性能随着模型大小的增加而提高。但是，如果从训练分布中学习到错误答案，则可以预期此结果。我们认为，与使用训练目标（而不是模仿网络文本）进行微调相比，单独扩大模型对于提高真实性的希望较小。
摘要：We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT-3, GPT-Neo/J, GPT-2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. For example, the 6B-parameter GPT-J model was 17% less truthful than its 125M-parameter counterpart. This contrasts with other NLP tasks, where performance improves with model size. However, this result is expected if false answers are learned from the training distribution. We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

【3】 Learning logic programs through divide, constrain, and conquer
标题：通过划分、约束和征服学习逻辑程序
链接：https://arxiv.org/abs/2109.07818

作者：Andrew Cropper
机构：University of Oxford
备注：Under review
摘要：我们介绍了一种归纳逻辑编程方法，它将经典的分治搜索与现代约束驱动搜索相结合。我们的anytime方法可以学习最优、递归和大型程序，并支持谓词发明。我们在三个领域（分类、归纳一般游戏和程序合成）上的实验表明，我们的方法可以提高预测精度并减少学习时间。
摘要：We introduce an inductive logic programming approach that combines classical divide-and-conquer search with modern constraint-driven search. Our anytime approach can learn optimal, recursive, and large programs and supports predicate invention. Our experiments on three domains (classification, inductive general game playing, and program synthesis) show that our approach can increase predictive accuracies and reduce learning times.

【4】 Neural-network acceleration of projection-based model-order-reduction for finite plasticity: Application to RVEs
标题：基于投影的有限塑性模型降阶的神经网络加速：在RVEs中的应用
链接：https://arxiv.org/abs/2109.07747

作者：S. Vijayaraghavan,L. Wu,L. Noels,S. P. A. Bordas,S. Natarajan,L. A. A. Beex
机构：University of Luxembourg, Technology and Medicine: , Avenue de la Fonte, Esch Sur Alzette, Luxembourg,Legato-Team, University of Liege, Bt. B,, Computational & Multiscale Mechanics of Materials, Quartier Polytech , alle de la, Dcouverte , Liege, Belgium
摘要：与传统的基于投影的模型降阶相比，它的神经网络加速具有在线模拟不需要方程的优点，这意味着不需要迭代求解方程组。因此，无需构造刚度矩阵，且应力更新只需每增量计算一次。在此贡献中，开发了一个递归神经网络，以加速RVE弹塑性力学行为的基于投影的模型降阶。与仅模拟宏观变形（路径）和宏观应力之间关系的神经网络不同，基于投影的模型降阶的神经网络加速保留了所有微观结构信息，代价是每增量计算一次该信息。
摘要：Compared to conventional projection-based model-order-reduction, its neural-network acceleration has the advantage that the online simulations are equation-free, meaning that no system of equations needs to be solved iteratively. Consequently, no stiffness matrix needs to be constructed and the stress update needs to be computed only once per increment. In this contribution, a recurrent neural network is developed to accelerate a projection-based model-order-reduction of the elastoplastic mechanical behaviour of an RVE. In contrast to a neural network that merely emulates the relation between the macroscopic deformation (path) and the macroscopic stress, the neural network acceleration of projection-based model-order-reduction preserves all microstructural information, at the price of computing this information once per increment.

【5】 Machine learning with quantum field theories
标题：基于量子场论的机器学习
链接：https://arxiv.org/abs/2109.07730

作者：Dimitrios Bachtis,Gert Aarts,Biagio Lucini
机构：Swansea University, Department of Mathematics, Bay Campus, SA,EN, Swansea, Wales, United Kingdom, Department of Physics, Singleton Campus, SA,PP, Swansea, Wales, United Kingdom, European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT)
备注：Presentation at the 38th International Symposium on Lattice Field Theory, 26th-30th July 2021, Massachusetts Institute of Technology, USA
摘要：离散化欧几里德场论与某种概率图形模型（即马尔可夫随机场的数学框架）之间的精确等价性为从量子场论的角度研究机器学习提供了机会。在这篇文章中，我们将通过Hammersley-Clifford定理证明，正方形晶格上的$\phi^{4}$标量场理论满足局部马尔可夫性质，因此可以重新表示为马尔可夫随机场。然后，我们将从$\phi^{4}$理论中推导机器学习算法和神经网络，这些算法和神经网络可视为传统神经网络结构的推广。最后，我们将介绍基于最小化$\phi^{4}$机器学习算法的概率分布和目标概率分布之间的不对称距离的应用程序。
摘要：The precise equivalence between discretized Euclidean field theories and a certain class of probabilistic graphical models, namely the mathematical framework of Markov random fields, opens up the opportunity to investigate machine learning from the perspective of quantum field theory. In this contribution we will demonstrate, through the Hammersley-Clifford theorem, that the $\phi^{4}$ scalar field theory on a square lattice satisfies the local Markov property and can therefore be recast as a Markov random field. We will then derive from the $\phi^{4}$ theory machine learning algorithms and neural networks which can be viewed as generalizations of conventional neural network architectures. Finally, we will conclude by presenting applications based on the minimization of an asymmetric distance between the probability distribution of the $\phi^{4}$ machine learning algorithms and target probability distributions.

【6】 BacHMMachine: An Interpretable and Scalable Model for Algorithmic Harmonization for Four-part Baroque Chorales
标题：BacHMMachine：一个可解释、可扩展的巴洛克四声部合唱团算法协调模型
链接：https://arxiv.org/abs/2109.07623

作者：Yunyao Zhu,Stephen Hahn,Simon Mak,Yue Jiang,Cynthia Rudin
机构： Duke University
备注：7 pages, 7 figures
摘要：算法协调——根据乐曲的旋律线自动协调乐曲——是一个具有挑战性的问题，引起了音乐理论家和计算机科学家的极大兴趣。一个特别有趣的流派是巴赫的四部巴洛克合唱。算法合唱协调的方法通常采用黑箱、“数据驱动”方法：它们不明确地整合音乐理论的原则，而是依赖于用大量合唱数据训练的复杂学习模型。我们提出了一个新的协调模型，称为BacHMMachine，它采用了一个由音乐创作原则指导的“理论驱动”框架，以及一个“数据驱动”的模型，用于在这个框架内学习作曲特征。顾名思义，BacHMMachine使用了一种基于键和和弦转换的新型隐马尔可夫模型，为从给定旋律线学习键调制和和弦进行提供了一个概率框架。这允许产生创造性的，但在音乐上连贯的合唱和声；与最先进的算法协调方法相比，整合构图原则可以形成一个更简单的模型，从而大大减少计算负担，提高解释性，而不会影响协调质量或音乐性。我们通过综合实验和图灵测试，将BacHMMachine与现有方法进行比较，证明了这一改进。
摘要：Algorithmic harmonization - the automated harmonization of a musical piece given its melodic line - is a challenging problem that has garnered much interest from both music theorists and computer scientists. One genre of particular interest is the four-part Baroque chorales of J.S. Bach. Methods for algorithmic chorale harmonization typically adopt a black-box, "data-driven" approach: they do not explicitly integrate principles from music theory but rely on a complex learning model trained with a large amount of chorale data. We propose instead a new harmonization model, called BacHMMachine, which employs a "theory-driven" framework guided by music composition principles, along with a "data-driven" model for learning compositional features within this framework. As its name suggests, BacHMMachine uses a novel Hidden Markov Model based on key and chord transitions, providing a probabilistic framework for learning key modulations and chordal progressions from a given melodic line. This allows for the generation of creative, yet musically coherent chorale harmonizations; integrating compositional principles allows for a much simpler model that results in vast decreases in computational burden and greater interpretability compared to state-of-the-art algorithmic harmonization methods, at no penalty to quality of harmonization or musicality. We demonstrate this improvement via comprehensive experiments and Turing tests comparing BacHMMachine to existing methods.

【7】 Multi-Task Learning with Sequence-Conditioned Transporter Networks
标题：基于序列条件转运器网络的多任务学习
链接：https://arxiv.org/abs/2109.07578

作者：Michael H. Lim,Andy Zeng,Brian Ichter,Maryam Bandari,Erwin Coumans,Claire Tomlin,Stefan Schaal,Aleksandra Faust
摘要：使机器人能够解决多种操作任务具有广泛的工业应用。虽然基于学习的方法具有灵活性和通用性，但扩展这些方法以解决此类组合任务仍然是一个挑战。在这项工作中，我们的目标是解决多任务学习的镜头序列条件和加权抽样。首先，我们提出了一套新的基准测试，专门针对组合任务MultiRaven，它允许通过任务模块定义自定义任务组合，这些任务模块受工业任务的启发，并举例说明了基于视觉的学习和规划方法的困难。其次，我们提出了一种基于视觉的端到端系统架构，即序列条件传输网络，它通过序列条件和加权采样来增强目标条件传输网络，并能够有效地学习解决多任务长时间问题。我们的分析表明，新框架不仅显著提高了新的10个多任务基准问题的选择和放置性能，而且加权抽样的多任务学习可以极大地提高单个任务的学习和代理性能。
摘要：Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks.

【8】 Learning the Regularization in DCE-MR Image Reconstruction for Functional Imaging of Kidneys
标题：肾脏功能成像DCE-MR图像重建的正则化学习
链接：https://arxiv.org/abs/2109.07548

作者：Aziz Koçanaoğulları,Cemre Ariyurek,Onur Afacan,Sila Kurugol
摘要：肾脏DCE-MRI旨在通过估计示踪动力学（TK）模型参数，对肾脏解剖进行定性评估，并对肾功能进行定量评估。准确估计TK模型参数需要精确测量高时间分辨率的动脉输入功能（AIF）。加速成像用于实现高时间分辨率，从而在重建图像中产生欠采样伪影。压缩感知（CS）方法提供了多种重建选项。最常见的是，时间差异的稀疏性被鼓励用于正则化以减少伪影。CS方法中增加正则化可以消除环境伪影，但也会在时间上过度平滑信号，从而降低参数估计精度。在这项工作中，我们提出了一种单图像训练的深度神经网络，在不降低功能成像标记准确性的情况下减少MRI采样伪影。我们不是在优化中使用惩罚项进行正则化，而是通过从低维表示生成图像来促进正则化。在这篇手稿中，我们激发并解释了低维输入设计。我们将我们的方法与具有多重正则化权重的CS重建进行比较。提出的方法产生的肾脏生物标记物与使用CS重建估计的基础真值标记物高度相关，CS重建优化用于功能分析。同时，该方法减少了重建图像中的伪影。
摘要：Kidney DCE-MRI aims at both qualitative assessment of kidney anatomy and quantitative assessment of kidney function by estimating the tracer kinetic (TK) model parameters. Accurate estimation of TK model parameters requires an accurate measurement of the arterial input function (AIF) with high temporal resolution. Accelerated imaging is used to achieve high temporal resolution, which yields under-sampling artifacts in the reconstructed images. Compressed sensing (CS) methods offer a variety of reconstruction options. Most commonly, sparsity of temporal differences is encouraged for regularization to reduce artifacts. Increasing regularization in CS methods removes the ambient artifacts but also over-smooths the signal temporally which reduces the parameter estimation accuracy. In this work, we propose a single image trained deep neural network to reduce MRI under-sampling artifacts without reducing the accuracy of functional imaging markers. Instead of regularizing with a penalty term in optimization, we promote regularization by generating images from a lower dimensional representation. In this manuscript we motivate and explain the lower dimensional input design. We compare our approach to CS reconstructions with multiple regularization weights. Proposed approach results in kidney biomarkers that are highly correlated with the ground truth markers estimated using the CS reconstruction which was optimized for functional analysis. At the same time, the proposed approach reduces the artifacts in the reconstructed images.

【9】 MOFSimplify: Machine Learning Models with Extracted Stability Data of Three Thousand Metal-Organic Frameworks
标题：MOFSIMPLIZE：提取三千金属有机骨架稳定性数据的机器学习模型
链接：https://arxiv.org/abs/2109.08098

作者：A. Nandy,G. Terrones,N. Arunachalam,C. Duan,D. W. Kastner,H. J. Kulik
机构： 1Department of Chemical Engineering, Massachusetts Institute of Technology, MA 0 2 1 39 2Department of Chemistry, MA 0 2 1 39 3Department of Biological Engineering
摘要：我们报告了一个工作流程和基于自然语言处理（NLP）的程序的输出，以挖掘现有的金属有机框架（MOF）文献，这些文献描述了MOF的结构特征及其溶剂去除和热稳定性。我们从文本挖掘中获得了2000多个溶剂去除稳定性指标，从热重分析数据中获得了3000个热分解温度。我们通过与手动标记的子集进行比较来评估NLP方法的有效性和提取数据的准确性。机器学习（ML，即人工神经网络）模型使用基于图形和孔隙几何结构的表示对该数据进行训练，从而能够预测具有量化不确定性的新MOF的稳定性。我们的web界面MOFSimplify为用户提供了对我们策划的数据的访问，并使他们能够利用这些数据对新MOF进行预测。MOF Simplify还鼓励社区反馈现有数据和ML模型预测，以基于社区的主动学习改进MOF稳定性模型。
摘要：We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal-organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We obtain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models.

【10】 Behavior of Keyword Spotting Networks Under Noisy Conditions
标题：关键词检测网络在噪声环境下的行为
链接：https://arxiv.org/abs/2109.07930

作者：Anwesh Mohanty,Adrian Frischknecht,Christoph Gerum,Oliver Bringmann
机构： Indian Institute Technology Bombay, Mumbai , India, University of T¨ubingen, T¨ubingen, Germany
备注：None
摘要：随着人工智能和智能设备的发展，关键字识别（KWS）正成为一种普遍的需求。该领域最近的工作集中在几种不同的体系结构上，以在低到中等噪声的数据集上获得良好的结果。然而，我们的实验表明，在高噪声条件下，这些模型的性能会恶化。在本文中，我们对各种噪声条件下最先进的KWS网络进行了广泛的比较。我们还建议将自适应批量归一化作为一种技术，在训练阶段噪声文件未知时提高网络性能。这种高噪声特性的结果使未来的工作能够开发在上述条件下性能更好的模型。
摘要：Keyword spotting (KWS) is becoming a ubiquitous need with the advancement in artificial intelligence and smart devices. Recent work in this field have focused on several different architectures to achieve good results on datasets with low to moderate noise. However, the performance of these models deteriorates under high noise conditions as shown by our experiments. In our paper, we present an extensive comparison between state-of-the-art KWS networks under various noisy conditions. We also suggest adaptive batch normalization as a technique to improve the performance of the networks when the noise files are unknown during the training phase. The results of such high noise characterization enable future work in developing models that perform better in the aforementioned conditions.

【11】 Directed degree corrected mixed membership model and estimating community memberships in directed networks
标题：有向度修正的混合成员模型及有向网络中社区成员的估计
链接：https://arxiv.org/abs/2109.07826

作者：Huan Qing
机构：ML] 16 Sep 20 2 1DIRECTED DEGREE CORRECTED MIXED MEMBERSHIPMODEL AND ESTIMATING COMMUNITY MEMBERSHIPSIN DIRECTED NETWORKSBY HUAN QING 1 1 School of Mathematics, China University of Mining and Technology
摘要：本文研究了有向网络中节点的社区成员关系的建模和估计问题，其中每一行（列）节点与确定其在每一行（列）社区中的成员关系的向量相关联。为了对这种有向网络进行建模，我们提出了考虑度异质性的有向度修正混合隶属度（DiDCMM）模型。当考虑程度异质性时，DiDCMM在混合成员网络的流行条件下是可识别的。基于群体邻接矩阵的左奇异向量的归一化形式所固有的锥结构和右奇异向量所固有的单纯形结构，我们构建了一个高效的算法DiMSC来推断行节点和列节点的社区成员向量。通过利用DiMSC的等价算法（返回与DiMSC相同的估计值）和行奇异向量偏差的最新发展，我们通过在DiDCMM下为每个行节点和每个列节点的推断隶属度向量提供误差界，证明了该算法在温和条件下是渐近一致的。通过模拟研究对理论进行了补充。
摘要：This paper considers the problem of modeling and estimating community memberships of nodes in a directed network where every row (column) node is associated with a vector determining its membership in each row (column) community. To model such directed network, we propose directed degree corrected mixed membership (DiDCMM) model by considering degree heterogeneity. DiDCMM is identifiable under popular conditions for mixed membership network when considering degree heterogeneity. Based on the cone structure inherent in the normalized version of the left singular vectors and the simplex structure inherent in the right singular vectors of the population adjacency matrix, we build an efficient algorithm called DiMSC to infer the community membership vectors for both row nodes and column nodes. By taking the advantage of DiMSC's equivalence algorithm which returns same estimations as DiMSC and the recent development on row-wise singular vector deviation, we show that the proposed algorithm is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vectors of each row node and each column node under DiDCMM. The theory is supplemented by a simulation study.

其他(23篇)

【1】 Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health
标题：部署不安宁多臂强盗的实地研究：协助非营利性组织改善妇幼保健
链接：https://arxiv.org/abs/2109.08075

作者：Aditya Mate,Lovish Madaan,Aparna Taneja,Neha Madhiwalla,Shresth Verma,Gargi Singh,Aparna Hegde,Pradeep Varakantham,Milind Tambe
机构： Google Research India, Harvard University, ARMMAN, Singapore Management University
摘要：手机的普及使非营利组织能够及时向其受益人提供重要的健康信息。本文描述了我们的工作，以帮助非营利组织在怀孕期间和分娩后利用自动消息传递程序向受益人（新母亲和准母亲）及时提供预防保健信息。不幸的是，这类信息传递计划的一个关键挑战是，相当一部分受益人退出了该计划。然而，非营利组织通常只有有限的卫生工作者资源（时间）来进行关键的服务呼叫，以便与受益人进行实时互动，以防止参与度下降。为了帮助非营利组织优化这一有限的资源，我们开发了一个无休止的多武装匪徒（RMABs）系统。该系统的一个关键技术贡献是一种新的离线历史数据聚类方法，用于推断未知的RMAB参数。我们的第二个主要贡献是与非政府组织合作，通过真实世界的服务质量改进研究，评估我们的RMAB系统。这项研究在7周的时间里对23003名参与者的服务电话优化策略进行了比较，以减少参与度下降。我们发现，与其他对照组相比，RMAB组提供了统计上显著的改善，减少了约30%的参与度下降。据我们所知，这是第一次证明RMAB在现实世界公共卫生环境中的效用的研究。我们正在将我们的RMAB系统过渡到NGO以供实际使用。
摘要：The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ~ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.

【2】 WildWood: a new Random Forest algorithm
标题：Wildwood：一种新的随机森林算法
链接：https://arxiv.org/abs/2109.08010

作者：Stéphane Gaïffas,Ibrahim Merad,Yiyang Yu
机构：LPSM, University of Paris
摘要：我们介绍了WildWood（WW），一种用于随机森林（RF）类型监督学习的新集成算法。虽然标准RF算法使用自举袋外样本计算袋外得分，但WW使用这些样本生成改进的预测，这些预测是由森林中每棵完全生长的树的所有可能子树的预测的集合给出的。这是通过对出袋样本计算指数权重的聚合实现的，由于一种称为上下文树权重的算法，指数权重的计算准确且非常有效。这种改进，结合直方图策略来加速分割查找，使WW与其他成熟的集成方法（如标准RF和极端梯度增强算法）相比，具有快速性和竞争力。
摘要：We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

【3】 Explainability Requires Interactivity
标题：可解释性需要交互性
链接：https://arxiv.org/abs/2109.07869

作者：Matthias Kirchler,Martin Graf,Marius Kloft,Christoph Lippert
机构： Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany, TU Kaiserslautern, Kaiserslautern, Germany, Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY
摘要：在解释深层神经网络的决定时，简单的故事很诱人，但也很危险。特别是在计算机视觉中，最流行的解释方法给用户一种错误的理解感，并提供了一幅过于简单的图片。我们引入了一个交互式框架来理解现代视觉模型中高度复杂的决策边界。它允许用户彻底检查、探测和测试网络的决策。在一系列案例研究中，我们比较了交互式方法与静态解释方法的威力，展示了这些方法如何导致用户误入歧途，并可能带来严重后果。
摘要：When explaining the decisions of deep neural networks, simple stories are tempting but dangerous. Especially in computer vision, the most popular explanation approaches give a false sense of comprehension to its users and provide an overly simplistic picture. We introduce an interactive framework to understand the highly complex decision boundaries of modern vision models. It allows the user to exhaustively inspect, probe, and test a network's decisions. Across a range of case studies, we compare the power of our interactive approach to static explanation methods, showing how these can lead a user astray, with potentially severe consequences.

【4】 OMPQ: Orthogonal Mixed Precision Quantization
标题：OMPQ：正交混合精度量化
链接：https://arxiv.org/abs/2109.07865

作者：Yuexiao Ma,Taisong Jin,Xiawu Zheng,Yan Wang,Huixia Li,Guannan Jiang,Wei Zhang,Rongrong Ji
机构：School of Informatics, Xiamen University, Pinterest, USA, Contemporary Amperex Technology Co., Limited, Institute of Artificial Intelligence, Xiamen University
摘要：为了弥补深度神经网络复杂性与硬件性能之间日益增大的差距，网络量化引起了越来越多的研究关注。混合精度量化的最新趋势利用硬件的多比特宽度算术运算来释放网络量化的全部潜力。然而，这也导致了一个困难的整数规划公式，并迫使大多数现有方法使用一个非常耗时的搜索过程，即使有各种松弛。我们建议优化一个代理度量，即网络正交性的概念，它与整数规划的损失高度相关，但也易于用线性规划进行优化，而不是解决原始整数规划的问题。这种方法将搜索时间和所需的数据量减少了几个数量级，几乎不影响量化精度。具体而言，在训练后量化方面，我们在MobileNet V2上实现了71.27%的Top-1精度，搜索只需9秒，在ImageNet上微调只需1.4 GPU小时。我们的代码可在https://github.com/MAC-AutoML/OMPQ.
摘要：To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. However, this also results in a difficult integer programming formulation, and forces most existing approaches to use an extremely time-consuming search process even with various relaxations. Instead of solving a problem of the original integer programming, we propose to optimize a proxy metric, the concept of network orthogonality, which is highly correlated with the loss of the integer programming but also easy to optimize with linear programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy. Specifically, on post-training quantization, we achieve 71.27% Top-1 accuracy on MobileNetV2, which only takes 9 seconds for searching and 1.4 GPU hours for finetuning on ImageNet. Our codes are avaliable at https://github.com/MAC-AutoML/OMPQ.

【5】 Reframing Instructional Prompts to GPTk's Language
标题：对GPTK语言的教学提示重构
链接：https://arxiv.org/abs/2109.07830

作者：Swaroop Mishra,Daniel Khashabi,Chitta Baral,Yejin Choi,Hannaneh Hajishirzi
机构：Arizona State University,Allen Institute for AI,University of Washington
备注：9 pages
摘要：模型设计者如何将任务说明转化为语言模型的有效提示？通过对GPT3的大量实证分析，我们观察到了成功教学提示的重要特征，并为模型设计者提出了几种重构技术来创建此类提示。例如，一个复杂的任务可以分解为多个简单的任务。我们对6个不同类别（问题生成、分类等）的12个NLP任务进行了实验。我们的结果表明，与现有的少数镜头基线相比，重构将少数镜头学习性能提高了14\%，同时降低了样本复杂性。在大型语言模型（如GPT3）上，性能提升尤为重要，因为在GPT3中，无法对大型数据集进行模型调优或提示。此外，我们观察到，此类收益不仅限于GPT3；在不同的模型体系结构中，重新格式化的任务仍然优于原始指令，这突出了这些指南的跨模型通用性。我们希望这些经验驱动的技术将为将来更有效地促进LMs铺平道路。
摘要：How can model designers turn task instructions into effective prompts for language models? Backed by extensive empirical analysis on GPT3, we observe important features for successful instructional prompts, and propose several reframing techniques for model designers to create such prompts. For example, a complex task can be decomposed into multiple simpler tasks. We experiment over 12 NLP tasks across 6 diverse categories (question generation, classification, etc.). Our results show that reframing improves few-shot learning performance by 14\% while reducing sample complexity over existing few-shot baselines. The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible. Furthermore, we observe that such gains are not limited to GPT3; the reframed tasks remain superior over raw instructions across different model architectures, underscoring the cross-model generality of these guidelines. We hope these empirical-driven techniques will pave way for more effective ways to prompt LMs in future.

【6】 End-to-End Partially Observable Visual Navigation in a Diverse Environment
标题：不同环境下的端到端部分可见视觉导航
链接：https://arxiv.org/abs/2109.07752

作者：Bo Ai,Wei Gao,Vinay,David Hsu
机构： showing our SPOT robot traversing many differentlocations on our university campus, The authors are with the School of Computing, National University ofSingapore
备注：8 pages, 6 figures, submitted to the IEEE International Conference on Robotics and Automation (ICRA), 2022. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
摘要：机器人如何能在丰富多样的环境中成功地导航，室内或室外，沿着办公室走廊或公园里的小径，在平坦的地面上，楼梯上或电梯上等？为此，这项工作针对三个挑战：（i）复杂的视觉观察，（ii）局部感知的部分可观察性，以及（iii）同时依赖于局部环境和高层目标的多模态导航行为。我们提出了一种新的神经网络（NN）表示本地控制器的体系结构，并利用端到端方法的灵活性来学习强大的策略。为了处理复杂的视觉观察，我们通过卷积层提取多尺度空间信息。为了处理部分可观测性，我们在类似LSTM的模块中编码丰富的历史信息。重要的是，我们将两者集成到一个统一的体系结构中，该体系结构利用卷积存储单元在多个空间尺度上跟踪观测历史，可以捕获观测和控制之间复杂的时空依赖关系。我们还将网络置于高层目标上，以生成不同的导航行为模式。具体而言，我们建议对不同的模式使用独立的存储单元，以防止学习策略中的模式崩溃。我们在现场机器人上实现了NN控制器，并在三个具有部分观察的挑战性任务中对其进行评估：对抗性行人回避、盲点障碍回避和电梯乘坐。我们的模型明显优于CNN、传统LSTM或我们模型的烧蚀版本。演示视频将公开，展示我们的SPOT机器人穿越我们大学校园的许多不同位置。
摘要：How can a robot navigate successfully in a rich and diverse environment, indoors or outdoors, along an office corridor or a trail in the park, on the flat ground, the staircase, or the elevator, etc.? To this end, this work aims at three challenges: (i) complex visual observations, (ii) partial observability of local sensing, and (iii) multimodal navigation behaviors that depend on both the local environment and the high-level goal. We propose a novel neural network (NN) architecture to represent a local controller and leverage the flexibility of the end-to-end approach to learn a powerful policy. To tackle complex visual observations, we extract multiscale spatial information through convolution layers. To deal with partial observability, we encode rich history information in LSTM-like modules. Importantly, we integrate the two into a single unified architecture that exploits convolutional memory cells to track the observation history at multiple spatial scales, which can capture the complex spatiotemporal dependencies between observations and controls. We additionally condition the network on the high-level goal in order to generate different navigation behavior modes. Specifically, we propose to use independent memory cells for different modes to prevent mode collapse in the learned policy. We implemented the NN controller on the SPOT robot and evaluate it on three challenging tasks with partial observations: adversarial pedestrian avoidance, blind-spot obstacle avoidance, and elevator riding. Our model significantly outperforms CNNs, conventional LSTMs, or the ablated versions of our model. A demo video will be publicly available, showing our SPOT robot traversing many different locations on our university campus.

【7】 Scaling Laws for Neural Machine Translation
标题：神经机器翻译的标度律
链接：https://arxiv.org/abs/2109.07740

作者：Behrooz Ghorbani,Orhan Firat,Markus Freitag,Ankur Bapna,Maxim Krikun,Xavier Garcia,Ciprian Chelba,Colin Cherry
备注：31 pages, 23 figures
摘要：我们提出了一个实证研究的编码器解码Transformer模型中使用的神经机器翻译（NMT）的缩放性能。我们证明了作为模型尺寸函数的交叉熵损失遵循一定的标度律。具体来说，（i）我们提出了一个公式，该公式将交叉熵损失的标度行为描述为编码器和解码器大小的二元函数，并表明它在各种标度方法和语言下给出了准确的预测；我们证明，仅参数的总数不足以达到这样的目的。（ii）在缩放解码器与缩放编码器时，我们观察到不同的幂律指数，并根据这一观察结果为编码器/解码器容量的优化分配提供建议。（iii）我们还报告，模型的缩放行为受到列车/测试集合成偏差的严重影响，我们将其定义为与自然生成文本（通过机器生成或人工翻译文本）的任何偏差。我们观察到，目标端的自然文本具有可伸缩性，这表现为成功减少了交叉熵损失。（iv）最后，我们研究了交叉熵损失与生成译文质量之间的关系。根据测试数据的性质，我们发现了两种不同的行为。对于最初从目标语言翻译为源语言的测试集，损失和BLEU分数随着模型大小的增加而提高。相反，对于最初从源语言翻译成目标语言的测试集，损失有所改善，但BLEU分数在某个阈值后停止改善。我们发布了本研究中使用的所有模型生成的文本。
摘要：We present an empirical study of scaling properties of encoder-decoder Transformer models used in neural machine translation (NMT). We show that cross-entropy loss as a function of model size follows a certain scaling law. Specifically (i) We propose a formula which describes the scaling behavior of cross-entropy loss as a bivariate function of encoder and decoder size, and show that it gives accurate predictions under a variety of scaling approaches and languages; we show that the total number of parameters alone is not sufficient for such purposes. (ii) We observe different power law exponents when scaling the decoder vs scaling the encoder, and provide recommendations for optimal allocation of encoder/decoder capacity based on this observation. (iii) We also report that the scaling behavior of the model is acutely influenced by composition bias of the train/test sets, which we define as any deviation from naturally generated text (either via machine generated or human translated text). We observe that natural text on the target side enjoys scaling, which manifests as successful reduction of the cross-entropy loss. (iv) Finally, we investigate the relationship between the cross-entropy loss and the quality of the generated translations. We find two different behaviors, depending on the nature of the test data. For test sets which were originally translated from target language to source language, both loss and BLEU score improve as model size increases. In contrast, for test sets originally translated from source language to target language, the loss improves, but the BLEU score stops improving after a certain threshold. We release generated text from all models used in this study.

【8】 Efficient Differentiable Simulation of Articulated Bodies
标题：铰接体的高效可差分仿真
链接：https://arxiv.org/abs/2109.07719

作者：Yi-Ling Qiao,Junbang Liang,Vladlen Koltun,Ming C. Lin
机构：Equal contribution 1University of Maryland
备注：ICML 2021
摘要：我们提出了一种有效的铰接体微分模拟方法。这使得关节机构动力学能够集成到深度学习框架中，并对关节机构上运行的神经网络进行基于梯度的优化。我们使用空间代数和伴随方法推导了正向动力学的梯度。我们的方法比autodiff工具快一个数量级。通过在整个模拟过程中只保存初始状态，我们的方法将内存需求减少了两个数量级。我们证明了铰接体的有效微分动力学在各种应用中的效用。我们表明，使用我们的方法提供的梯度，铰接系统的强化学习可以加速。在控制和反问题的应用中，我们的工作实现的基于梯度的优化将收敛速度提高一个数量级以上。
摘要：We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the forward dynamics using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than autodiff tools. By only saving the initial states throughout the simulation process, our method reduces memory requirements by two orders of magnitude. We demonstrate the utility of efficient differentiable dynamics for articulated bodies in a variety of applications. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method. In applications to control and inverse problems, gradient-based optimization enabled by our work accelerates convergence by more than an order of magnitude.

【9】 Transferable Persona-Grounded Dialogues via Grounded Minimal Edits
标题：通过基于角色的最小编辑实现可转移的基于角色的对话
链接：https://arxiv.org/abs/2109.07713

作者：Chen Henry Wu,Yinhe Zheng,Xiaoxi Mao,Minlie Huang
机构： Department of Computer Science and Technology, Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research, Center for Information Science and Technology, Tsinghua University, Beijing, China
备注：Accepted to EMNLP 2021
摘要：扎根的对话模式产生基于某些概念的回应。受扎根对话数据分布的限制，基于此类数据训练的模型在数据分布和扎根概念类型方面面临可转移性挑战。为了应对这些挑战，我们提出了扎根的最小编辑框架，该框架将根据给定的概念对现有的响应进行最小程度的编辑。针对人物角色，我们提出了扎根最小编辑器（Grounded Minimal Editor，GME），它通过分离和重组响应中与人物角色相关和不可知人物角色的部分来学习编辑。为了评估基于角色的最小编辑，我们提供了PersonaMinEdit数据集，实验结果表明GME在很大程度上优于竞争基线。为了评估可转移性，我们在BlendedSkillTalk测试集上进行了实验，结果表明GME可以编辑对话模型的反应，从而在保留知识和移情的同时，大大提高其角色一致性。
摘要：Grounded dialogue models generate responses that are grounded on certain concepts. Limited by the distribution of grounded dialogue data, models trained on such data face the transferability challenges in terms of the data distribution and the type of grounded concepts. To address the challenges, we propose the grounded minimal editing framework, which minimally edits existing responses to be grounded on the given concept. Focusing on personas, we propose Grounded Minimal Editor (GME), which learns to edit by disentangling and recombining persona-related and persona-agnostic parts of the response. To evaluate persona-grounded minimal editing, we present the PersonaMinEdit dataset, and experimental results show that GME outperforms competitive baselines by a large margin. To evaluate the transferability, we experiment on the test set of BlendedSkillTalk and show that GME can edit dialogue models' responses to largely improve their persona consistency while preserving the use of knowledge and empathy.

【10】 Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs
标题：利用基于激活的梯度输出稀疏性加速CNN的反向传播
链接：https://arxiv.org/abs/2109.07710

作者：Anup Sarma,Sonali Singh,Huaipan Jiang,Ashutosh Pattnaik,Asit K Mishra,Vijaykrishnan Narayanan,Mahmut T Kandemir,Chita R Das
机构：The Pennsylvania State University
摘要：基于机器/深度学习（ML/DL）的技术正在成为许多尖端技术背后的驱动力，实现了图像分类和目标检测等计算机视觉工作负载的高精度。然而，训练这些涉及大参数的模型既耗时又耗能量。在这方面，一些先前的工作已经提倡稀疏性以加快DL训练的速度，更重要的是，推理阶段。这项工作首先观察到，在训练过程中，前后传球的稀疏性是相关的。在此背景下，我们研究了基于梯度下降的优化算法中固有的两种稀疏性（输入和输出类型），并提出了一种硬件微体系结构来利用这两种稀疏性。我们的实验结果在Imagenet数据集上使用了五种最先进的CNN模型，与密集基线执行相比，反向传播速度在1.69$\times$到5.43$\times$之间。通过利用向前和向后传球中的稀疏性，与稀疏性无关的基线执行相比，加速比的提高范围从1.68$\times$到3.30$\times$。我们的工作还实现了在几个先前提出的密集和稀疏加速器平台上显著减少训练迭代时间，以及在基于GPU的执行上实现数量级的能效改进。
摘要：Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training these models involving large parameters is both time-consuming and energy-hogging. In this regard, several prior works have advocated for sparsity to speed up the of DL training and more so, the inference phase. This work begins with the observation that during training, sparsity in the forward and backward passes are correlated. In that context, we investigate two types of sparsity (input and output type) inherent in gradient descent-based optimization algorithms and propose a hardware micro-architecture to leverage the same. Our experimental results use five state-of-the-art CNN models on the Imagenet dataset, and show back propagation speedups in the range of 1.69$\times$ to 5.43$\times$, compared to the dense baseline execution. By exploiting sparsity in both the forward and backward passes, speedup improvements range from 1.68$\times$ to 3.30$\times$ over the sparsity-agnostic baseline execution. Our work also achieves significant reduction in training iteration time over several previously proposed dense as well as sparse accelerator based platforms, in addition to achieving order of magnitude energy efficiency improvements over GPU based execution.

【11】 Federated Submodel Averaging
标题：联邦子模型平均
链接：https://arxiv.org/abs/2109.07704

作者：Yucheng Ding,Chaoyue Niu. Fan Wu,Shaojie Tang,Chengfei Lv,Yanghe Feng,Guihai Chen
机构： Shanghai Jiao Tong University, China, University of Texas at Dallas, USA, Alibaba Group, China, National University of Defense Technology, China
摘要：我们研究联邦学习的实际数据特征，其中来自客户端的非i.i.d.数据具有稀疏特征，并且某个客户端的本地数据通常只涉及整个模型的一小部分，称为子模型。由于数据稀疏性，经典的联邦平均（FedAvg）算法或其变体将严重减慢速度，因为在更新全局模型时，每个客户机对整个模型（不包括其子模型）的零更新不准确地聚合。因此，我们提出了联邦子模型平均（FedSubAvg），以确保每个模型参数的全局更新的期望值等于涉及它的客户端的本地更新的平均值。我们从理论上证明了FedSubAvg的收敛速度，在一个称为元素梯度范数的新度量下推导了一个上界。特别是，这种新的度量可以描述稀疏数据上联邦优化的收敛性，而FedAvg及其变体中使用的平方梯度范数的传统度量不能。我们在公共和工业数据集上广泛评估了FedSubAvg。评估结果表明，FedSubAvg显著优于FedAvg及其变体。
摘要：We study practical data characteristics underlying federated learning, where non-i.i.d. data from clients have sparse features, and a certain client's local data normally involves only a small part of the full model, called a submodel. Due to data sparsity, the classical federated averaging (FedAvg) algorithm or its variants will be severely slowed down, because when updating the global model, each client's zero update of the full model excluding its submodel is inaccurately aggregated. Therefore, we propose federated submodel averaging (FedSubAvg), ensuring that the expectation of the global update of each model parameter is equal to the average of the local updates of the clients who involve it. We theoretically proved the convergence rate of FedSubAvg by deriving an upper bound under a new metric called the element-wise gradient norm. In particular, this new metric can characterize the convergence of federated optimization over sparse data, while the conventional metric of squared gradient norm used in FedAvg and its variants cannot. We extensively evaluated FedSubAvg over both public and industrial datasets. The evaluation results demonstrate that FedSubAvg significantly outperforms FedAvg and its variants.

【12】 ROS-X-Habitat: Bridging the ROS Ecosystem with Embodied AI
标题：ROS-X-Habit：ROS生态系统与具体化人工智能之间的桥梁
链接：https://arxiv.org/abs/2109.07703

作者：Guanxiong Chen,Haoyu Yang,Ian M. Mitchell
机构：The University of British Columbia, Vancouver, BC V,T ,Z
备注：Submitted to RA-L + ICRA 2022
摘要：我们介绍ROS-X-Habitat，这是一个软件接口，通过ROS将嵌入式强化学习代理的AI Habitat平台与其他机器人资源连接起来。该接口不仅提供了嵌入式代理和模拟器之间的标准化通信协议，还支持基于物理的仿真。通过该接口，机器人专家可以在另一个模拟环境中训练自己的栖息地RL代理，或者在栖息地Sim中开发自己的机器人算法。通过计算机模拟实验，我们证明ROS-X-HABITATION对栖息地代理的导航性能和模拟速度的影响最小；一套标准的ROS制图、规划和导航工具可以在栖息地模拟器中运行，栖息地代理可以在标准ROS模拟器Gazebo中运行。
摘要：We introduce ROS-X-Habitat, a software interface that bridges the AI Habitat platform for embodied reinforcement learning agents with other robotics resources via ROS. This interface not only offers standardized communication protocols between embodied agents and simulators, but also enables physics-based simulation. With this interface, roboticists are able to train their own Habitat RL agents in another simulation environment or to develop their own robotic algorithms inside Habitat Sim. Through in silico experiments, we demonstrate that ROS-X-Habitat has minimal impact on the navigation performance and simulation speed of Habitat agents; that a standard set of ROS mapping, planning and navigation tools can run in the Habitat simulator, and that a Habitat agent can run in the standard ROS simulator Gazebo.

【13】 On-the-Fly Ensemble Pruning in Evolving Data Streams
标题：进化数据流中的On-the-the-The-飞翔集合剪枝
链接：https://arxiv.org/abs/2109.07611

作者：Sanem Elbasi,Alican Büyükçakır,Hamed Bonab,Fazli Can
机构：Bilkent Information Retrieval Group, Bilkent University, College of Information and Computer Sciences, University of Massachusetts Amherst
备注：5 pages, 2 figures
摘要：集成剪枝是从集成中选择组件分类器子集的过程，该集成的性能至少与原始集成一样好，同时减少存储和计算成本。数据流中的集成剪枝在很大程度上是一个未开发的研究领域。它需要分析流上运行的集成组件，并区分有用分类器和冗余分类器。我们提出了CCRP，这是一种用于多类数据流分类的动态集成剪枝方法，该方法通过类组件排名的不平衡感知融合来实现。CCRP的目标是得到的剪枝集成包含每个目标类的最佳性能分类器，从而减少类不平衡的影响。在真实世界和合成数据流上进行的实验表明，将CCRP作为剪枝方案的不同类型的EN半成品在PAR或优越的性能上是一致的，平均消耗20%到90%。最后，通过与基于集合权重和基本秩融合方法的剪枝方案的比较，验证了本文提出的剪枝方案。
摘要：Ensemble pruning is the process of selecting a subset of componentclassifiers from an ensemble which performs at least as well as theoriginal ensemble while reducing storage and computational costs.Ensemble pruning in data streams is a largely unexplored area ofresearch. It requires analysis of ensemble components as they arerunning on the stream, and differentiation of useful classifiers fromredundant ones. We present CCRP, an on-the-fly ensemble prun-ing method for multi-class data stream classification empoweredby an imbalance-aware fusion of class-wise component rankings.CCRP aims that the resulting pruned ensemble contains the bestperforming classifier for each target class and hence, reduces the ef-fects of class imbalance. The conducted experiments on real-worldand synthetic data streams demonstrate that different types of en-sembles that integrate CCRP as their pruning scheme consistentlyyield on par or superior performance with 20% to 90% less averagememory consumption. Lastly, we validate the proposed pruningscheme by comparing our approach against pruning schemes basedon ensemble weights and basic rank fusion methods.

【14】 A Column Streaming-Based Convolution Engine and Mapping Algorithm for CNN-based Edge AI accelerators
标题：基于CNN的Edge AI加速器中基于列流的卷积引擎和映射算法
链接：https://arxiv.org/abs/2109.07601

作者：Weison Lin,Tughrul Arslan
机构：Institute for Integrated Micro and Nano Systems, School of Engineering, The University of Edinburgh, Edinburgh, UK
摘要：Edge AI加速器已成为近客户在无人机（UAV）、图像识别传感器、可穿戴设备、机器人和遥感卫星等领域应用的解决方案。这些应用不仅需要满足性能目标，而且由于其便携式移动功能和有限的电源，还需要满足严格的面积和电源限制。因此，本文提出了一种基于列流的卷积引擎，其中包括处理元素的列集设计，以便在边缘AI加速器中灵活地适用不同的CNN算法。与商业化CNN加速器相比，关键结果表明，基于列流的卷积引擎需要类似的执行周期来处理227 x 227特征映射，同时避免零填充惩罚。
摘要：Edge AI accelerators have been emerging as a solution for near customers' applications in areas such as unmanned aerial vehicles (UAVs), image recognition sensors, wearable devices, robotics, and remote sensing satellites. These applications not only require meeting performance targets but also meeting strict area and power constraints due to their portable mobility feature and limited power sources. As a result, a column streaming-based convolution engine has been proposed in this paper that includes column sets of processing elements design for flexibility in terms of the applicability for different CNN algorithms in edge AI accelerators. Comparing to a commercialized CNN accelerator, the key results reveal that the column streaming-based convolution engine requires similar execution cycles for processing a 227 x 227 feature map with avoiding zero-padding penalties.

【15】 Differentiable Physics: A Position Piece
标题：可微物理：一篇阵地文章
链接：https://arxiv.org/abs/2109.07573

作者：Bharath Ramsundar,Dilip Krishnamurthy,Venkatasubramanian Viswanathan
机构：Deep Forest Sciences Inc., Department of Mechanical Engineering, Carnegie Mellon University
备注：12 pages, 1 figure
摘要：可微物理将可微规划的新技术与物理模拟的经典数值方法相结合，为物理系统的建模和理解提供了一种新的方法。我们调查快速增长的文献的可微物理技术和突出的方法参数估计，学习表示，解决微分方程，并开发我们称之为科学基础模型使用数据和归纳先验。我们认为，可微物理通过将经典解析解与数值方法相结合，利用可微编程的桥梁，为物理现象的建模提供了一种新的范式。
摘要：Differentiable physics provides a new approach for modeling and understanding the physical systems by pairing the new technology of differentiable programming with classical numerical methods for physical simulation. We survey the rapidly growing literature of differentiable physics techniques and highlight methods for parameter estimation, learning representations, solving differential equations, and developing what we call scientific foundation models using data and inductive priors. We argue that differentiable physics offers a new paradigm for modeling physical phenomena by combining classical analytic solutions with numerical methodology using the bridge of differentiable programming.

【16】 CounterNet: End-to-End Training of Counterfactual Aware Predictions
标题：CounterNet：反事实意识预测的端到端训练
链接：https://arxiv.org/abs/2109.07557

作者：Hangzhi Guo,Thanh Hong Nguyen,Amulya Yadav
机构：College of Information Sciences and Technology, Pennsylvania State University, Computer and Information Science, University of Oregon
摘要：这项工作提出了CounterNet，一种新的端到端学习框架，它将预测模型训练和反事实（CF）解释生成集成到一个端到端管道中。反事实解释试图找到对实例特征值的最小修改，从而将ML模型的预测更改为预定义的输出。先前的CF解释技术依赖于为每个输入实例解决单独的时间密集型优化问题来找到CF示例，并且还存在模型预测和解释之间目标不一致的问题，这导致CF解释的质量存在重大缺陷。另一方面，CounterNet在同一个框架中集成了预测和解释，这使得CF示例生成的优化仅与预测模型一起进行一次。我们提出了一种新的反向传播算法，可以有效地训练对抗网的网络。最后，我们在多个真实数据集上进行了广泛的实验。我们的结果表明，CounterNet生成高质量的预测，以及任何新输入实例的相应CF示例（具有高有效性），大大快于现有最先进的基线。
摘要：This work presents CounterNet, a novel end-to-end learning framework which integrates the predictive model training and counterfactual (CF) explanation generation into a single end-to-end pipeline. Counterfactual explanations attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model to a predefined output. Prior CF explanation techniques rely on solving separate time-intensive optimization problems for every single input instance to find CF examples, and also suffer from the misalignment of objectives between model predictions and explanations, which leads to significant shortcomings in the quality of CF explanations. CounterNet, on the other hand, integrates both prediction and explanation in the same framework, which enables the optimization of the CF example generation only once together with the predictive model. We propose a novel variant of back-propagation which can help in effectively training CounterNet's network. Finally, we conduct extensive experiments on multiple real-world datasets. Our results show that CounterNet generates high-quality predictions, and corresponding CF examples (with high validity) for any new input instance significantly faster than existing state-of-the-art baselines.

【17】 Discovering Useful Compact Sets of Sequential Rules in a Long Sequence
标题：在长序列中发现有用的序列规则紧凑集
链接：https://arxiv.org/abs/2109.07519

作者：Erwan Bourrand,Luis Galárraga,Esther Galbrun,Elisa Fromont,Alexandre Termier
机构：∗ Univ Rennes, IRISA UMR , Rennes, France, † Advisor_SLA, Inc., ‡ Inria RBA Rennes, France, § University of Eastern Finland
备注：8 pages, published in the proceedings of the 33rd IEEE International Conference on Tools with Artificial Intelligence
摘要：我们感兴趣的是理解长序列符号事件的潜在生成过程。为此，我们提出了COSSU，一种挖掘小而有意义的序列规则集的算法。这些规则是使用MDL启发的标准选择的，该标准支持紧凑性，并依赖于一种新的基于规则的序列编码方案。我们的评估表明，COSSU可以成功地从一个长序列中检索到相关的封闭序列规则集。这些规则构成了一个可解释的模型，在下一个元素预测和分类任务中具有竞争性的准确性。
摘要：We are interested in understanding the underlying generation process for long sequences of symbolic events. To do so, we propose COSSU, an algorithm to mine small and meaningful sets of sequential rules. The rules are selected using an MDL-inspired criterion that favors compactness and relies on a novel rule-based encoding scheme for sequences. Our evaluation shows that COSSU can successfully retrieve relevant sets of closed sequential rules from a long sequence. Such rules constitute an interpretable model that exhibits competitive accuracy for the tasks of next-element prediction and classification.

【18】 Generalized XGBoost Method
标题：广义XGBoost方法
链接：https://arxiv.org/abs/2109.07473

作者：Yang Guang
摘要：XGBoost方法有许多优点，特别适用于大数据的统计分析，但其损失函数仅限于凸函数。在许多特定应用中，最好使用非凸损失函数。本文提出了一种广义XGBoost方法，该方法要求较弱的损失函数条件，并涉及更一般的损失函数，包括凸损失函数和一些非凸损失函数。此外，将这种广义XGBoost方法推广到多元损失函数，形成了一种更为广义的XGBoost方法。该方法是一种多元正则化树boosting方法，可以对大多数常用的参数概率分布中的多个参数进行建模，并用预测变量进行拟合。同时，给出了非寿险定价的相关算法和实例。
摘要：The XGBoost method has many advantages and is especially suitable for statistical analysis of big data, but its loss function is limited to convex functions. In many specific applications, a nonconvex loss function would be preferable. In this paper, we propose a generalized XGBoost method, which requires weaker loss function condition and involves more general loss functions, including convex loss functions and some non-convex loss functions. Furthermore, this generalized XGBoost method is extended to multivariate loss function to form a more generalized XGBoost method. This method is a multivariate regularized tree boosting method, which can model multiple parameters in most of the frequently-used parametric probability distributions to be fitted by predictor variables. Meanwhile, the related algorithms and some examples in non-life insurance pricing are given.

【19】 Frame by frame completion probability of an NFL pass
标题：NFL传递的逐帧完成概率
链接：https://arxiv.org/abs/2109.08051

作者：Gustavo Pompeu da Silva,Rafael de Andrade Moral
机构：Departament of Exact Sciences, University of S˜ao Paulo, Av., P´adua Dias, Piracicaba,-, S˜ao Paulo, Brazil., Department of Mathematics and Statistics, University, County Kildare, Ireland.
备注：26 pages, 13 figures, 5 tables
摘要：美式足球是一项越来越受欢迎的运动，在世界许多国家拥有越来越多的观众。世界上最受关注的美式足球联盟是美国国家足球联盟（NFL），在这里，每一场进攻都可以是跑动或传球，在这项工作中，我们关注传球。许多因素会影响传球完成的概率，例如接球人与最近的防守人的距离、接球人与传球人的距离、进攻阵型等。在预测通过的完成概率时，必须知道通过的目标是谁。通过使用球员与球之间的距离测量，可以计算经验概率并非常准确地预测目标是谁。最大的问题是：当球在空中时，在NFL比赛中传球完成的可能性有多大？我们开发了一种机器学习算法来解决这个问题，它基于几个预测因子。利用2018年NFL赛季的数据，我们基于随机森林模型获得了通过完成概率的条件和边际预测。这是基于两个阶段的过程：首先，我们计算每个进攻球员成为传球目标的概率，然后，在目标的条件下，我们基于随机森林模型预测完成概率。最后，利用总概率定律计算一般完工概率。我们展示了所选剧本的动画，并展示了传球完成概率的演变。
摘要：American football is an increasingly popular sport, with a growing audience in many countries in the world. The most watched American football league in the world is the United States' National Football League (NFL), where every offensive play can be either a run or a pass, and in this work we focus on passes. Many factors can affect the probability of pass completion, such as receiver separation from the nearest defender, distance from receiver to passer, offense formation, among many others. When predicting the completion probability of a pass, it is essential to know who the target of the pass is. By using distance measures between players and the ball, it is possible to calculate empirical probabilities and predict very accurately who the target will be. The big question is: how likely is it for a pass to be completed in an NFL match while the ball is in the air? We developed a machine learning algorithm to answer this based on several predictors. Using data from the 2018 NFL season, we obtained conditional and marginal predictions for pass completion probability based on a random forest model. This is based on a two-stage procedure: first, we calculate the probability of each offensive player being the pass target, then, conditional on the target, we predict completion probability based on the random forest model. Finally, the general completion probability can be calculated using the law of total probability. We present animations for selected plays and show the pass completion probability evolution.

【20】 PDBench: Evaluating Computational Methods for Protein Sequence Design
标题：PDBench：评估蛋白质序列设计的计算方法
链接：https://arxiv.org/abs/2109.07925

作者：Leonardo V. Castorina,Rokas Petrenas,Katric Subr,Christopher W. Wood
机构： School of Informatics, University of Edinburgh, Crichton St, Newington, Edinburgh EH,AB, School of Biological Sciences, University of Edinburgh, Roger Land Building, Edinburgh EH,FF
备注：9 pages, 5 figures
摘要：蛋白质在所有生命系统中都起着关键作用：将太阳能转化为化学能，复制DNA，作为高性能材料的基础，传感等等。虽然在自然界中取样的功能范围令人难以置信，但它只占可能蛋白质宇宙的一小部分。如果我们能够利用这一未经探索的蛋白质结构库，我们就可以寻找具有有用性质的新蛋白质，以应对人类面临的环境和医学挑战。这就是蛋白质设计的目的。序列设计是蛋白质设计的一个重要方面，许多成功的方法已经被开发出来。最近，将其定义为分类问题的深度学习方法已经成为一种强有力的方法。除了他们报告的性能改进之外，与基于物理的方法相比，他们的主要优势是计算负担从用户转移到了开发人员身上，从而增加了设计方法的可访问性。尽管有这种趋势，评估和比较这类模型的工具仍然相当通用。本文的目标是解决评估的及时性问题，并在机器学习社区内聚焦于将加速影响的特定评估标准。我们提出了一个精心策划的蛋白质基准集，并提出了一些标准测试来评估基于深度学习的方法的性能。我们的稳健基准提供了对设计方法行为的生物学洞察，这对于评估其性能和效用至关重要。我们比较了五个现有的模型和两个新的序列预测模型。最后，我们使用最先进的结构预测算法AlphaFold2测试这些模型产生的设计，以确定它们是否可能折叠成预期的3D形状。
摘要：Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could search for novel proteins with useful properties that we could apply to tackle the environmental and medical challenges facing humanity. This is the purpose of protein design. Sequence design is an important aspect of protein design, and many successful methods to do this have been developed. Recently, deep-learning methods that frame it as a classification problem have emerged as a powerful approach. Beyond their reported improvement in performance, their primary advantage over physics-based methods is that the computational burden is shifted from the user to the developers, thereby increasing accessibility to the design method. Despite this trend, the tools for assessment and comparison of such models remain quite generic. The goal of this paper is to both address the timely problem of evaluation and to shine a spotlight, within the Machine Learning community, on specific assessment criteria that will accelerate impact. We present a carefully curated benchmark set of proteins and propose a number of standard tests to assess the performance of deep learning based methods. Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility. We compare five existing models with two novel models for sequence prediction. Finally, we test the designs produced by these models with AlphaFold2, a state-of-the-art structure-prediction algorithm, to determine if they are likely to fold into the intended 3D shapes.

【21】 Beyond 5G RIS mmWave Systems: Where Communication and Localization Meet
标题：超越5G RIS毫米波系统：通信和本地化的交汇点
链接：https://arxiv.org/abs/2109.07729

作者：Jiguang He,Fan Jiang,Kamran Keykhosravi,Joonas Kokkoniemi,Henk Wymeersch,Markku Juntti
备注：7 pages, 6 figures, submitted to IEEE Vehicular Technology Magazine
摘要：即将推出的beyond fifth generation（5G）通信系统旨在进一步提高关键性能指标，并通过采用新兴技术（如可重构智能表面（RIS）、集成通信、定位和传感以及毫米波/太赫兹通信）全面支持全新的使用案例。由最先进的人工智能技术所赋予的无线智能已经在收发机上得到广泛的考虑，现在的范例被认为是通过RISs转向无线传播环境的智能控制。在本文中，我们认为要充分利用RIS的潜力，本地化和通信必须紧密耦合。这与5G和早期版本形成了鲜明对比，后者本地化只是一项次要的附加服务。为了支持这一点，我们首先介绍RIS毫米波信道建模的基本原理，然后介绍RIS信道状态信息获取和链路建立。然后，我们从单独和联合的角度处理本地化和通信之间的联系。
摘要：Upcoming beyond fifth generation (5G) communications systems aim at further enhancing key performance indicators and fully supporting brand new use cases by embracing emerging techniques, e.g., reconfigurable intelligent surface (RIS), integrated communication, localization, and sensing, and mmWave/THz communications. The wireless intelligence empowered by state-of-the-art artificial intelligence techniques has been widely considered at the transceivers, and now the paradigm is deemed to be shifted to the smart control of radio propagation environment by virtue of RISs. In this article, we argue that to harness the full potential of RISs, localization and communication must be tightly coupled. This is in sharp contrast to 5G and earlier generations, where localization was a minor additional service. To support this, we first introduce the fundamentals of RIS mmWave channel modeling, followed by RIS channel state information acquisition and link establishment. Then, we deal with the connection between localization and communications, from a separate and joint perspective.

【22】 Computationally-Efficient Climate Predictions using Multi-Fidelity Surrogate Modelling
标题：使用多保真代理模型进行计算高效的气候预测
链接：https://arxiv.org/abs/2109.07468

作者：Ben Hudson,Frederik Nijweide,Isaac Sebenius
机构：Computer Lab, University of Cambridge
备注：Submitted to CDCEO 2021 (1st Workshop on Complex Data Challenges in Earth Observation)
摘要：准确地模拟地球气候有着广泛的应用，从预测当地天气到理解全球气候变化。气候现象的低保真度模拟很容易获得，但高保真度模拟的成本很高。因此，我们研究了基于高斯过程的多保真度替代模型作为一种低成本产生高保真气候预测的方法的潜力。具体而言，我们的模型结合了低保真全球气候模型（GCM）和高保真区域气候模型（RCM）的预测，以生成秘鲁海岸线山区的高保真温度预测。与高保真模型相比，我们能够以显著更低的计算成本生成高保真温度预测：我们的预测平均误差为$15.62^\circ\text{C}^2$，但我们的方法仅对6%的感兴趣区域评估高保真模型。
摘要：Accurately modelling the Earth's climate has widespread applications ranging from forecasting local weather to understanding global climate change. Low-fidelity simulations of climate phenomena are readily available, but high-fidelity simulations are expensive to obtain. We therefore investigate the potential of Gaussian process-based multi-fidelity surrogate modelling as a way to produce high-fidelity climate predictions at low cost. Specifically, our model combines the predictions of a low-fidelity Global Climate Model (GCM) and those of a high-fidelity Regional Climate Model (RCM) to produce high-fidelity temperature predictions for a mountainous region on the coastline of Peru. We are able to produce high-fidelity temperature predictions at significantly lower computational cost compared to the high-fidelity model alone: our predictions have an average error of $15.62^\circ\text{C}^2$ yet our approach only evaluates the high-fidelity model on 6% of the region of interest.

【23】 Fermion Sampling Made More Efficient
标题：费米子采样变得更有效率
链接：https://arxiv.org/abs/2109.07358

作者：Haoran Sun,Jie Zou,Xiaopeng Li
机构：Cavendish Laboratory, University of Cambridge, Cambridge, CB,HE, U.K., State Key Laboratory of Surface Physics, Institute of Nanoelectronics and Quantum Computing, and Department of Physics, Fudan University, Shanghai , China
摘要：费米子采样是产生多体斯莱特行列式波函数的概率分布，在统计分析中称为“行列式点过程”。由于其固有的泡利不相容原理，它的应用已经超越了模拟费米子量子多体物理，扩展到为各种数据集构建机器学习模型。这里我们提出了一种费米子采样算法，它具有多项式时间复杂度——费米子数为二次，系统大小为线性。该算法在计算时间上比最著名的算法大约高100%。在对相应的边缘分布进行采样时，我们的算法有了更大的改进，实现了缩放优势。我们在几个测试应用中展示了它的强大功能，包括多体系统中的费米子采样和文本摘要的机器学习任务，并通过计算浮点运算证实了它比其他方法提高了计算效率。
摘要：Fermion sampling is to generate probability distribution of a many-body Slater-determinant wavefunction, which is termed "determinantal point process" in statistical analysis. For its inherently-embedded Pauli exclusion principle, its application reaches beyond simulating fermionic quantum many-body physics to constructing machine learning models for diversified datasets. Here we propose a fermion sampling algorithm, which has a polynomial time-complexity -- quadratic in the fermion number and linear in the system size. This algorithm is about 100% more efficient in computation time than the best known algorithms. In sampling the corresponding marginal distribution, our algorithm has a more drastic improvement, achieving a scaling advantage. We demonstrate its power on several test applications, including sampling fermions in a many-body system and a machine learning task of text summarization, and confirm its improved computation efficiency over other methods by counting floating-point operations.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递