机器学习学术速递[9.21]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计169篇

Graph相关(图学习|图神经网络|图优化等)(10篇)

【1】 Recommender systems based on graph embedding techniques: A comprehensive review
标题：基于图嵌入技术的推荐系统综述
链接：https://arxiv.org/abs/2109.09587

作者：Yue Deng
机构：Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu , P.R. China
摘要：推荐系统是缓解信息过载问题的关键工具，其目的是通过分析观察到的用户项关系，从数百万候选项中预测用户的首选项。对于解决推荐系统遇到的稀疏性和冷启动问题，最近证明，通过利用辅助信息和知识来丰富推荐的观察信息来发现隐藏的（间接的）用户项关系是很有希望的；面对高复杂性和大规模的辅助信息和知识，其性能在很大程度上取决于推荐模型的可伸缩性。在高效利用复杂和大规模数据方面，图形嵌入技术的研究是一个重要课题。为推荐系统配备图嵌入技术有助于优于传统的直接基于图拓扑分析的推荐实现，近年来得到了广泛的研究。本文从二部图、一般图和知识图的嵌入技术出发，系统地回顾了基于图嵌入的推荐方法，并提出了该方法的一般设计流程。此外，通过仿真，将几种具有代表性的基于图嵌入的推荐模型与最常用的传统推荐模型进行比较，表明传统模型在预测隐式用户项交互方面总体上优于基于图嵌入的推荐模型，揭示了基于图嵌入的推荐在这些任务中的相对弱点。为了促进未来的研究，本文提出了在不同任务中权衡基于图嵌入的推荐和传统推荐的建设性建议，以及一些开放性问题。
摘要：Recommender systems, a pivotal tool to alleviate the information overload problem, aim to predict user's preferred items from millions of candidates by analyzing observed user-item relations. As for tackling the sparsity and cold start problems encountered by recommender systems, uncovering hidden (indirect) user-item relations by employing side information and knowledge to enrich observed information for the recommendation has been proven promising recently; and its performance is largely determined by the scalability of recommendation models in the face of the high complexity and large scale of side information and knowledge. Making great strides towards efficiently utilizing complex and large-scale data, research into graph embedding techniques is a major topic. Equipping recommender systems with graph embedding techniques contributes to outperforming the conventional recommendation implementing directly based on graph topology analysis and has been widely studied these years. This article systematically retrospects graph embedding-based recommendation from embedding techniques for bipartite graphs, general graphs, and knowledge graphs, and proposes a general design pipeline of that. In addition, comparing several representative graph embedding-based recommendation models with the most common-used conventional recommendation models, on simulations, manifests that the conventional models overall outperform the graph embedding-based ones in predicting implicit user-item interactions, revealing the relative weakness of graph embedding-based recommendation in these tasks. To foster future research, this article proposes constructive suggestions on making a trade-off between graph embedding-based recommendation and the conventional recommendation in different tasks as well as some open questions.

【2】 Knowledge Graph Question Answering via SPARQL Silhouette Generation
标题：基于SPARQL轮廓生成的知识图问答
链接：https://arxiv.org/abs/2109.09475

作者：Sukannya Purkayastha,Saswati Dana,Dinesh Garg,Dinesh Khandelwal,G P Shrivatsa Bhargav
机构：Bhargav , TCS Research, IBM Research
备注：7 + 6 pages, 10 figures
摘要：随着大规模知识图（KGs）的出现，知识图问答（KGQA）已成为自然语言处理中的一个重要领域。最近，基于神经机器翻译的方法获得了将自然语言查询转化为结构化查询语言的能力，从而解决了KGQA任务。然而，这些方法中的大多数都难以处理词汇表外的单词，因为这些单词在训练期间看不到测试实体和关系。在这项工作中，我们提出了一个模块化的两阶段神经结构来解决KGQA任务。第一阶段为输入问题生成名为SPARQL剪影的目标SPARQL草图。这包括（1）噪声模拟器，以促进词汇表外的单词并减少词汇表大小（2）用于文本到SPARQL轮廓生成的seq2seq模型。第二阶段是神经图搜索模块。第一阶段生成的SPARQL轮廓在第二阶段通过替换预测结构中的精确关系进行提取。我们通过设计一个噪声模拟器来模拟理想和现实场景。实验结果表明，在第一阶段生成的SPARQL轮廓的质量对于理想场景来说是非常出色的，但是对于现实场景（即嘈杂的链接器），生成的SPARQL轮廓的质量会急剧下降。然而，我们的神经图搜索模块将其恢复得相当好。我们表明，对于LC-QuAD-1数据集，我们的方法可以通过3.72%F1的幅度实现合理的性能改进。我们相信，我们提出的方法是新颖的，将导致动态KGQA解决方案，适合于实际应用。
摘要：Knowledge Graph Question Answering (KGQA) has become a prominent area in natural language processing due to the emergence of large-scale Knowledge Graphs (KGs). Recently Neural Machine Translation based approaches are gaining momentum that translates natural language queries to structured query languages thereby solving the KGQA task. However, most of these methods struggle with out-of-vocabulary words where test entities and relations are not seen during training time. In this work, we propose a modular two-stage neural architecture to solve the KGQA task. The first stage generates a sketch of the target SPARQL called SPARQL silhouette for the input question. This comprises of (1) Noise simulator to facilitate out-of-vocabulary words and to reduce vocabulary size (2) seq2seq model for text to SPARQL silhouette generation. The second stage is a Neural Graph Search Module. SPARQL silhouette generated in the first stage is distilled in the second stage by substituting precise relation in the predicted structure. We simulate ideal and realistic scenarios by designing a noise simulator. Experimental results show that the quality of generated SPARQL silhouette in the first stage is outstanding for the ideal scenarios but for realistic scenarios (i.e. noisy linker), the quality of the resulting SPARQL silhouette drops drastically. However, our neural graph search module recovers it considerably. We show that our method can achieve reasonable performance improving the state-of-art by a margin of 3.72% F1 for the LC-QuAD-1 dataset. We believe, our proposed approach is novel and will lead to dynamic KGQA solutions that are suited for practical applications.

【3】 Edge-similarity-aware Graph Neural Networks
标题：边相似感知图神经网络
链接：https://arxiv.org/abs/2109.09432

作者：Vincent Mallet,Carlos G. Oliver,William L. Hamilton
机构：Pasteur Institute, Les Mines-Paristech, Carlos Oliver, Department of Computer Science, McGill University, Montreal Institute for Learning Algorithms
摘要：图是一种普遍存在的数据表示，因为它们表示灵活紧凑的表示。例如，RNA的3D结构可以有效地表示为$\textit{2.5D graphs}$，其节点为核苷酸，边表示化学相互作用的图。在这种情况下，我们有生物证据证明边缘类型之间的相似性，因为一些化学相互作用比其他更相似。随着图形神经网络的引入，图形机器学习最近取得了突破。该算法可以作为图边上的图节点之间的消息传递算法。这些消息可能取决于它们所传输的边缘类型，但当前没有任何方法限制在边缘类型更改时如何更改消息。受RNA用例的启发，在本项目中，我们引入了一个图形神经网络层，它可以利用边缘之间相似性的先验信息。我们表明，尽管在理论上考虑了这种相似性，但在我们这里包含的任务和数据集上，经验性能并没有得到提高。
摘要：Graph are a ubiquitous data representation, as they represent a flexible and compact representation. For instance, the 3D structure of RNA can be efficiently represented as $\textit{2.5D graphs}$, graphs whose nodes are nucleotides and edges represent chemical interactions. In this setting, we have biological evidence of the similarity between the edge types, as some chemical interactions are more similar than others. Machine learning on graphs have recently experienced a breakthrough with the introduction of Graph Neural Networks. This algorithm can be framed as a message passing algorithm between graph nodes over graph edges. These messages can depend on the edge type they are transmitted through, but no method currently constrains how a message is altered when the edge type changes. Motivated by the RNA use case, in this project we introduce a graph neural network layer which can leverage prior information about similarities between edges. We show that despite the theoretical appeal of including this similarity prior, the empirical performance is not enhanced on the tasks and datasets we include here.

【4】 A Meta-Learning Approach for Training Explainable Graph Neural Networks
标题：一种训练可解释图神经网络的元学习方法
链接：https://arxiv.org/abs/2109.09426

作者：Indro Spinelli,Simone Scardapane,Aurelio Uncini
摘要：本文研究了图神经网络的可解释性程度。现有的解释者通过寻找全局/局部子图来解释预测，但它们是在GNN已经训练好之后应用的。在这里，我们提出了一个元学习框架，通过将优化过程导向我们所称的“可解释极小值”，直接在训练时提高GNN的可解释性水平。我们的框架（称为MATE，MetA Train to Explain）联合训练一个模型来解决原始任务，例如节点分类，并为下游算法提供易于处理的输出，这些算法以人性化的方式解释模型的决策。特别是，我们对模型参数进行元训练，以快速最小化在随机采样节点上动态训练的实例级GNNEXPlaner的错误。最终的内部表示依赖于一组特征，这些特征可以通过解释算法“更好地”理解，例如，gnnexplaner的另一个实例。我们的模型不可知方法可以改进对不同GNN体系结构的解释，并使用任何基于实例的解释程序来驱动这个过程。在用于节点和图分类的合成数据集和真实数据集上的实验表明，我们可以生成更容易用不同算法解释的模型。此外，这种可解释性的提高并不影响模型的准确性。
摘要：In this paper, we investigate the degree of explainability of graph neural networks (GNNs). Existing explainers work by finding global/local subgraphs to explain a prediction, but they are applied after a GNN has already been trained. Here, we propose a meta-learning framework for improving the level of explainability of a GNN directly at training time, by steering the optimization procedure towards what we call `interpretable minima'. Our framework (called MATE, MetA-Train to Explain) jointly trains a model to solve the original task, e.g., node classification, and to provide easily processable outputs for downstream algorithms that explain the model's decisions in a human-friendly way. In particular, we meta-train the model's parameters to quickly minimize the error of an instance-level GNNExplainer trained on-the-fly on randomly sampled nodes. The final internal representation relies upon a set of features that can be `better' understood by an explanation algorithm, e.g., another instance of GNNExplainer. Our model-agnostic approach can improve the explanations produced for different GNN architectures and use any instance-based explainer to drive this process. Experiments on synthetic and real-world datasets for node and graph classification show that we can produce models that are consistently easier to explain by different algorithms. Furthermore, this increase in explainability comes at no cost for the accuracy of the model.

【5】 Network Clustering by Embedding of Attribute-augmented Graphs
标题：基于属性增广图嵌入的网络聚类
链接：https://arxiv.org/abs/2109.09367

作者：Pasqua D'Ambra,Clara De Santis,Panayot S. Vassilevski,Luisa Cutillo
备注：29 pages, 14 figures, preprint
摘要：在本文中，我们提出了一种新的方法来检测具有属性顶点的无向图中的簇。其目的是对不仅在结构连通性方面而且在属性值方面相似的顶点进行分组。如[5,27]所述，我们通过创建额外的顶点和边，将顶点之间的结构和属性相似性合并到增广图中。增广图嵌入到与其拉普拉斯算子相关联的欧几里德空间中，并应用改进的K-均值算法来识别聚类。修改后的K-means使用向量距离度量，其中根据结构连接性和属性相似性为每个原始顶点分配一组向量值坐标。为了定义坐标向量，我们采用自适应AMG（代数多重网格）方法来识别嵌入欧几里德空间中的坐标方向，扩展了我们以前对于无属性图的结果。我们证明了我们提出的聚类方法在合成和真实属性图上的有效性。
摘要：In this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. The aim is to group vertices which are similar not only in terms of structural connectivity but also in terms of attribute values. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as proposed in [5, 27]. The augmented graph is embedded in a Euclidean space associated to its Laplacian and apply a modified K-means algorithm to identify clusters. The modified K-means uses a vector distance measure where to each original vertex is assigned a vector-valued set of coordinates depending on both structural connectivity and attribute similarities. To define the coordinate vectors we employ an adaptive AMG (Algebraic MultiGrid) method to identify the coordinate directions in the embedding Euclidean space extending our previous result for graphs without attributes. We demonstrate the effectiveness of our proposed clustering method on both synthetic and real-world attributed graphs.

【6】 Grouping Search Results with Product Graphs in E-commerce Platforms
标题：电子商务平台中用产品图对搜索结果进行分组
链接：https://arxiv.org/abs/2109.09349

作者：Suhas Ranganath,Shibsankar Das,Sanjay Thilaivasan,Shipra Agarwal,Varun Shrivastava
机构：Walmart Labs, Bangalore, India, Sanjay Thillaivasan, Varun Srivastava
备注：None
摘要：向用户显示相关的搜索结果是任何搜索系统的主要挑战。沃尔玛电子商务为其客户提供了一个全渠道的搜索平台，以搜索数百万种产品。此搜索平台将文本查询作为输入，并显示目录中的相关项。主要的挑战之一是，这种查询很难理解，因为它在许多情况下包含多个意图。本文提出了一个框架，将搜索结果分组到多个排名列表中，以提供更好的用户意图。该框架旨在创建一个产品图，该图具有产品实体之间的关系，并利用它将搜索结果分组到一系列堆栈中，其中每个堆栈根据精确的意图提供一组项目。例如，对于查询“牛奶”，可以将结果分组为多个堆栈“白牛奶”、“低脂牛奶”、“杏仁牛奶”、“调味牛奶”。我们通过评估算法在搜索质量相关性和用户行为信号（如添加到购物车）方面如何改善用户体验来衡量算法的影响。
摘要：Showing relevant search results to the user is the primary challenge for any search system. Walmart e-commerce provides an omnichannel search platform to its customers to search from millions of products. This search platform takes a textual query as input and shows relevant items from the catalog. One of the primary challenges is that this queries are complex to understand as it contains multiple intent in many cases. This paper proposes a framework to group search results into multiple ranked lists intending to provide better user intent. The framework is to create a product graph having relations between product entities and utilize it to group search results into a series of stacks where each stack provides a group of items based on a precise intent. As an example, for a query "milk," the results can be grouped into multiple stacks of "white milk", "low-fat milk", "almond milk", "flavored milk". We measure the impact of our algorithm by evaluating how it improves the user experience both in terms of search quality relevance and user behavioral signals like Add-To-Cart.

【7】 Feature Correlation Aggregation: on the Path to Better Graph Neural Networks
标题：特征相关性聚合：更好的图神经网络之路
链接：https://arxiv.org/abs/2109.09300

作者：Jieming Zhou,Tong Zhang,Pengfei Fang,Lars Petersson,Mehrtash Harandi
机构：Australian National University, Canberra, Acton , Australia, ´Ecole polytechnique f´ed´erale de Lausanne, Rte Cantonale, Lausanne, Switzerland, Data, CSIRO, Monash University, Wellington Rd, Clayton VIC , Australia
摘要：在引入图形神经网络（GNNs）之前，建模和分析不规则数据，特别是图形，被认为是深度学习的致命弱点。GNNs的核心概念是通过递归聚合中心节点及其邻居的表示来找到表示。GNNs的核心概念是通过递归聚合中心节点及其邻居的表示来找到表示，许多GNNs的设计都证明了它的成功。然而，它们大多只关注于使用节点与其邻居之间的一阶信息。在本文中，我们通过对GNN的核心操作进行令人沮丧的简单和天真的修改，引入了一个中心节点置换变量函数，即特征相关聚合（FOG）模块，该模块从管道中节点与其相邻节点之间的特征相关中学习二阶信息。通过将FOG添加到现有GNN变体中，我们从经验上验证了这一二阶信息补充了原始GNN在一系列基准测试中生成的特性。当模型在使用较少参数的情况下显著超过先前的最新结果时，可以观察到模型性能的明显提升。（例如，使用图卷积网络对真实世界的分子数据集进行33.116%的改进）。
摘要：Prior to the introduction of Graph Neural Networks (GNNs), modeling and analyzing irregular data, particularly graphs, was thought to be the Achilles' heel of deep learning. The core concept of GNNs is to find a representation by recursively aggregating the representations of a central node and those of its neighbors. The core concept of GNNs is to find a representation by recursively aggregating the representations of a central node and those of its neighbor, and its success has been demonstrated by many GNNs' designs. However, most of them only focus on using the first-order information between a node and its neighbors. In this paper, we introduce a central node permutation variant function through a frustratingly simple and innocent-looking modification to the core operation of a GNN, namely the Feature cOrrelation aGgregation (FOG) module which learns the second-order information from feature correlation between a node and its neighbors in the pipeline. By adding FOG into existing variants of GNNs, we empirically verify this second-order information complements the features generated by original GNNs across a broad set of benchmarks. A tangible boost in performance of the model is observed where the model surpasses previous state-of-the-art results by a significant margin while employing fewer parameters. (e.g., 33.116% improvement on a real-world molecular dataset using graph convolutional networks).

【8】 Temporal Knowledge Graph Completion using Box Embeddings
标题：基于盒嵌入的时态知识图补全
链接：https://arxiv.org/abs/2109.08970

作者：Johannes Messner,Ralph Abboud,İsmail İlkan Ceylan
机构：Department of Computer Science, University of Oxford, UK
摘要：知识图完成是基于知识图中的现有数据推断缺失事实的任务。时态知识图完成（TKGC）是该任务对时态知识图的扩展，其中每个事实与时间戳额外关联。当前的TKGC方法主要基于现有的嵌入模型，这些嵌入模型是为（静态）知识图完成而开发的，并扩展这些模型以纳入时间，其思想是学习实体、关系和，和时间戳，然后使用学习到的表示来预测不同时间步的缺失事实。本文在静态知识图嵌入模型BoxE的基础上，提出了一种TKGC的BoxTE嵌入模型。我们证明了BoxTE是完全表达的，并且在时间环境中具有很强的归纳能力。然后，我们对我们的模型进行了实证评估，并表明它在几个TKGC基准上实现了最先进的结果。
摘要：Knowledge graph completion is the task of inferring missing facts based on existing data in a knowledge graph. Temporal knowledge graph completion (TKGC) is an extension of this task to temporal knowledge graphs, where each fact is additionally associated with a time stamp. Current approaches for TKGC primarily build on existing embedding models which are developed for (static) knowledge graph completion, and extend these models to incorporate time, where the idea is to learn latent representations for entities, relations, and timestamps and then use the learned representations to predict missing facts at various time steps. In this paper, we propose BoxTE, a box embedding model for TKGC, building on the static knowledge graph embedding model BoxE. We show that BoxTE is fully expressive, and possesses strong inductive capacity in the temporal setting. We then empirically evaluate our model and show that it achieves state-of-the-art results on several TKGC benchmarks.

【9】 Releasing Graph Neural Networks with Differential Privacy Guarantees
标题：保证差分隐私的释放图神经网络
链接：https://arxiv.org/abs/2109.08907

作者：Iyiola E. Olatunji,Thorben Funke,Megha Khosla
机构：L,S Research Center, Hannover, Germany
摘要：随着图形神经网络（GNN）在医疗保健和医学等敏感应用中的日益普及，人们对经过训练的GNN的隐私方面提出了担忧。更值得注意的是，GNN容易受到隐私攻击，例如成员身份推断攻击，即使只允许黑盒访问经过训练的模型。为了建立防御，差异隐私已成为一种在训练数据集中隐藏敏感数据的机制。根据教师集合的私有聚合（PATE）策略，最近的方法利用了大量教师模型集合。这些教师接受关于不相交的私人数据子集的训练，并被雇佣将知识转移到学生模型中，然后在隐私保证的情况下发布。然而，将图形数据分割成许多不相交的训练集可能会破坏结构信息并对准确性产生不利影响。我们提出了一种新的特定于图的发布学生GNN的方案，该方案避免了将私有训练数据完全拆分。学生GNN使用公共数据进行训练，部分使用专门为每个查询节点训练的教师GNN模型进行私人标记。我们在R\`{e}nyi差异隐私框架中对我们的方法进行了理论分析，并提供了隐私保证。此外，我们还展示了我们的方法与几种基线（包括适用于图形结构数据的PATE基线）相比的坚实的实验性能。我们的匿名代码是可用的。
摘要：With the increasing popularity of Graph Neural Networks (GNNs) in several sensitive applications like healthcare and medicine, concerns have been raised over the privacy aspects of trained GNNs. More notably, GNNs are vulnerable to privacy attacks, such as membership inference attacks, even if only blackbox access to the trained model is granted. To build defenses, differential privacy has emerged as a mechanism to disguise the sensitive data in training datasets. Following the strategy of Private Aggregation of Teacher Ensembles (PATE), recent methods leverage a large ensemble of teacher models. These teachers are trained on disjoint subsets of private data and are employed to transfer knowledge to a student model, which is then released with privacy guarantees. However, splitting graph data into many disjoint training sets may destroy the structural information and adversely affect accuracy. We propose a new graph-specific scheme of releasing a student GNN, which avoids splitting private training data altogether. The student GNN is trained using public data, partly labeled privately using the teacher GNN models trained exclusively for each query node. We theoretically analyze our approach in the R\`{e}nyi differential privacy framework and provide privacy guarantees. Besides, we show the solid experimental performance of our method compared to several baselines, including the PATE baseline adapted for graph-structured data. Our anonymized code is available.

【10】 Efficient Variational Graph Autoencoders for Unsupervised Cross-domain Prerequisite Chains
标题：无监督跨域先决条件链的高效变分图自动编码器
链接：https://arxiv.org/abs/2109.08722

作者：Irene Li,Vanessa Yan,Dragomir Radev
机构：Department of Computer Science, Yale University
摘要：先决条件链学习帮助人们高效地获取新知识。虽然人们可以快速确定某个领域中的学习路径而不是概念，但在其他领域中找到这样的路径可能是一项挑战。我们引入域对抗变分图自动编码器（DAVGAE）来有效地解决这个跨域先决条件链学习任务。我们的新模型由一个变分图自动编码器（VGAE）和一个域鉴别器组成。VGAE通过链接预测来预测概念关系，而域鉴别器则以源域和目标域数据为输入，并通过训练来预测域标签。最重要的是，与当前最先进的模型相比，该方法只需要简单的齐次图作为输入。我们在TeachteBankCD数据集上对我们的模型进行了评估，结果表明，我们的模型在仅使用1/10的图规模和1/3的计算时间的情况下优于最近基于图的基准测试。
摘要：Prerequisite chain learning helps people acquire new knowledge efficiently. While people may quickly determine learning paths over concepts in a domain, finding such paths in other domains can be challenging. We introduce Domain-Adversarial Variational Graph Autoencoders (DAVGAE) to solve this cross-domain prerequisite chain learning task efficiently. Our novel model consists of a variational graph autoencoder (VGAE) and a domain discriminator. The VGAE is trained to predict concept relations through link prediction, while the domain discriminator takes both source and target domain data as input and is trained to predict domain labels. Most importantly, this method only needs simple homogeneous graphs as input, compared with the current state-of-the-art model. We evaluate our model on the LectureBankCD dataset, and results show that our model outperforms recent graph-based benchmarks while using only 1/10 of graph scale and 1/3 computation time.

Transformer(1篇)

【1】 Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
标题：双模Transformer：一种用于远程模拟并矢相互作用的多模Transformer
链接：https://arxiv.org/abs/2109.09487

作者：David Curto,Albert Clapés,Javier Selva,Sorina Smeureanu,Julio C. S. Jacques Junior,David Gallardo-Pujol,Georgina Guilera,David Leiva,Thomas B. Moeslund,Sergio Escalera,Cristina Palmero
机构：Universitat de Barcelona ,Universitat Politecnica de Catalunya, Computer Vision Center ,Aalborg University
备注：Accepted to the 2021 ICCV Workshop on Understanding Social Behavior in Dyadic and Small Group Interactions
摘要：由于个性化计算的广泛应用，它已成为计算机视觉领域的一个新兴课题。然而，关于这个主题的大多数工作都集中在分析个体，即使应用到交互场景中，也会在短时间内进行。为了解决这些局限性，我们提出了一种新的多模态多主题变换器体系结构，该体系结构使用可变时间窗口对二元交互中的个人和人际特征进行建模，从而能够捕获长期的相互依赖关系。我们提出的跨学科层允许网络通过注意操作明确地模拟学科之间的互动。这种概念验证方法显示了两个交互对象在较长时间内的多模态和联合建模如何有助于预测个体属性。使用Dyadformer，我们改进了UDIVA v0.5数据集上个体受试者最先进的自我报告人格推断结果。
摘要：Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art self-reported personality inference results on individual subjects on the UDIVA v0.5 dataset.

GAN|对抗|攻击|生成相关(8篇)

【1】 CARL: Conditional-value-at-risk Adversarial Reinforcement Learning
标题：卡尔：条件风险价值对抗性强化学习
链接：https://arxiv.org/abs/2109.09470

作者：M. Godbout,M. Heuillet,S. Chandra,R. Bhati,A. Durand
机构： Conditional-value-at-risk AdversarialReinforcement LearningMathieu Godbout⋆Université Lavalmathieu, caMaxime HeuilletUniversité Lavalmaxime, quebecRupali BhatiUniversité Lavalrupali, caAudrey DurandUniversité Lavalaudrey
摘要：本文提出了一种风险规避强化学习（RL）方法，称为条件风险值对抗强化学习（CARL）。就我们所知，CARL是条件风险价值（CVaR）RL的第一个博弈公式。博弈发生在政策制定者和对手之间，在给定有限预算的情况下，会干扰政策制定者的状态转换。我们证明，在极大极小平衡点，学习策略是CVaR最优的，且风险容忍度与对手的预算明确相关。我们提供了一个基于梯度的训练过程来解决卡尔问题，将其描述为一个零和Stackelberg博弈，支持使用深度强化学习架构和训练算法。最后，我们证明，在玩具网格环境中，解决卡尔博弈确实会导致风险规避行为，这也证实了越来越多的对手会产生越来越谨慎的政策。
摘要：In this paper we present a risk-averse reinforcement learning (RL) method called Conditional value-at-risk Adversarial Reinforcement Learning (CARL). To the best of our knowledge, CARL is the first game formulation for Conditional Value-at-Risk (CVaR) RL. The game takes place between a policy player and an adversary that perturbs the policy player's state transitions given a finite budget. We prove that, at the maximin equilibrium point, the learned policy is CVaR optimal with a risk tolerance explicitly related to the adversary's budget. We provide a gradient-based training procedure to solve CARL by formulating it as a zero-sum Stackelberg Game, enabling the use of deep reinforcement learning architectures and training algorithms. Finally, we show that solving the CARL game does lead to risk-averse behaviour in a toy grid environment, also confirming that an increased adversary produces increasingly cautious policies.

【2】 Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits
标题：对抗性多臂土匪的广义平移和尺度不变在线算法
链接：https://arxiv.org/abs/2109.09212

作者：Kaan Gokcesu,Hakan Gokcesu
备注：arXiv admin note: substantial text overlap with arXiv:2009.04372
摘要：我们研究了对抗性多臂强盗问题，并创建了一个完全在线的算法框架，该框架在任意平移和手臂损失比例下保持不变。我们研究了针对一般竞争类的算法的预期性能，这使得它适用于各种各样的问题场景。我们的算法从通用预测的角度工作，使用的性能度量是对任意arm选择序列的预期后悔，这是我们的损失和竞争损失序列之间的差异。竞赛类可以设计为包括固定臂选择、切换盗贼、上下文盗贼或任何其他感兴趣的竞赛。竞赛类中的顺序通常由手头的特定应用程序决定，并应进行相应的设计。我们的算法既不使用也不需要关于丢失序列的任何初步信息，而且完全在线。它的性能界限是损失平方和的二阶界限，其中损失的任何仿射变换对归一化遗憾没有影响。
摘要：We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses. We study the expected performance of our algorithm against a generic competition class, which makes it applicable for a wide variety of problem scenarios. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences, which is the difference between our losses and a competing loss sequence. The competition class can be designed to include fixed arm selections, switching bandits, contextual bandits, or any other competition of interest. The sequences in the competition class are generally determined by the specific application at hand and should be designed accordingly. Our algorithm neither uses nor needs any preliminary information about the loss sequences and is completely online. Its performance bounds are the second order bounds in terms of sum of the squared losses, where any affine transform of the losses has no effect on the normalized regret.

【3】 ComicGAN: Text-to-Comic Generative Adversarial Network
标题：ComicGan：文本到漫画生成性对抗性网络
链接：https://arxiv.org/abs/2109.09120

作者：Ben Proven-Bessel,Zilong Zhao,Lydia Chen
机构：Delft University of Technology
摘要：漫画插图的绘制和注释是一个复杂而困难的过程。现有的机器学习算法还没有被开发出来，以根据插图的描述或漫画中的对话来创建漫画插图。此外，不知道生成性对抗网络（GAN）是否能够生成与对话和/或描述相对应的原创漫画。GANs在制作照片逼真的图像方面取得了成功，但这项技术并不一定能转化为一代完美的漫画。更重要的是，漫画评估是一个突出的挑战，因为像《盗梦空间》分数这样的常用指标无法进行比较，因为它们被设计用于处理照片。本文研究内容：1。我们实现了ComicGAN，这是一种基于文本到图像GAN的新颖的文本到漫画管道，该GAN根据文本描述合成漫画。2.我们描述了一个深入的实证研究的技术困难漫画生成使用甘的。ComicGAN有两个新特性：（i）通过排列和增强从标签创建文本描述，以及（ii）使用卷积神经网络进行自定义图像编码。我们在两个场景中广泛评估了拟议的ComicGAN，即从描述生成图像和从对话生成图像。我们对1000个Dilbert漫画面板和6000个描述的结果显示，文本输入的合成漫画面板类似于原始Dilbert面板。与基线算法相比，文本描述创建和自定义图像编码的新方法改善了Frechet初始距离、细节和整体图像质量。从描述中生成插图提供了清晰的漫画，包括描述中指定的字符和颜色。
摘要：Drawing and annotating comic illustrations is a complex and difficult process. No existing machine learning algorithms have been developed to create comic illustrations based on descriptions of illustrations, or the dialogue in comics. Moreover, it is not known if a generative adversarial network (GAN) can generate original comics that correspond to the dialogue and/or descriptions. GANs are successful in producing photo-realistic images, but this technology does not necessarily translate to generation of flawless comics. What is more, comic evaluation is a prominent challenge as common metrics such as Inception Score will not perform comparably, as they are designed to work on photos. In this paper: 1. We implement ComicGAN, a novel text-to-comic pipeline based on a text-to-image GAN that synthesizes comics according to text descriptions. 2. We describe an in-depth empirical study of the technical difficulties of comic generation using GAN's. ComicGAN has two novel features: (i) text description creation from labels via permutation and augmentation, and (ii) custom image encoding with Convolutional Neural Networks. We extensively evaluate the proposed ComicGAN in two scenarios, namely image generation from descriptions, and image generation from dialogue. Our results on 1000 Dilbert comic panels and 6000 descriptions show synthetic comic panels from text inputs resemble original Dilbert panels. Novel methods for text description creation and custom image encoding brought improvements to Frechet Inception Distance, detail, and overall image quality over baseline algorithms. Generating illustrations from descriptions provided clear comics including characters and colours that were specified in the descriptions.

【4】 Hydroelectric Generation Forecasting with Long Short Term Memory (LSTM) Based Deep Learning Model for Turkey
标题：基于长短期记忆(LSTM)的土耳其水电发电量预测深度学习模型
链接：https://arxiv.org/abs/2109.09013

作者：Mehmet Bulut
机构：AFFILIATIONS, Electricity Generation Inc., General Directorate, Ankara, TURKEY, Atilim University, School of Civil Aviation, Ankara, TURKEY, Orcid. ,-,-,-
摘要：水力发电是一种可再生能源，在土耳其已使用多年。基于水库的水力发电厂的产量因参数不同而不同。因此，水力发电量的估算在发电规划中具有重要意义。在本文中，使用基于长短时记忆（LSTM）网络的深度学习模型对土耳其的每月水力发电量进行了估算。设计的深度学习模型基于水力生产时间序列和多年的未来生产计划。通过使用实际生产数据和不同的LSTM深度学习模型，检验了它们在下一年水力发电月度预测中的性能。结果表明，将基于多年实际生产数据的时间序列与深度学习模型相结合进行长期预测是成功的。在本研究中，可以看出，100层LSTM模型（其中根据RMSE和MAPE值使用120个月（10年）的水力发电时间数据）是估算精度最高的模型，其MAPE值在年度总量中为0.1311（13.1%），在月平均分布中为1.09%。在该模型中，100层LSTM模型获得了最佳结果，其中使用了144个月（12年）水力发电数据的时间数据，RMSE值每年为29689，月分布为2474.08。根据研究结果，建议使用至少120个月的生产时间数据，用LSTM建立可接受的水电预测模型。
摘要：Hydroelectricity is one of the renewable energy source, has been used for many years in Turkey. The production of hydraulic power plants based on water reservoirs varies based on different parameters. For this reason, the estimation of hydraulic production gains importance in terms of the planning of electricity generation. In this article, the estimation of Turkey's monthly hydroelectricity production has been made with the long-short-term memory (LSTM) network-based deep learning model. The designed deep learning model is based on hydraulic production time series and future production planning for many years. By using real production data and different LSTM deep learning models, their performance on the monthly forecast of hydraulic electricity generation of the next year has been examined. The obtained results showed that the use of time series based on real production data for many years and deep learning model together is successful in long-term prediction. In the study, it is seen that the 100-layer LSTM model, in which 120 months (10 years) hydroelectric generation time data are used according to the RMSE and MAPE values, are the highest model in terms of estimation accuracy, with a MAPE value of 0.1311 (13.1%) in the annual total and 1.09% as the monthly average distribution. In this model, the best results were obtained for the 100-layer LSTM model, in which the time data of 144 months (12 years) hydroelectric generation data are used, with a RMSE value of 29,689 annually and 2474.08 in monthly distribution. According to the results of the study, time data covering at least 120 months of production is recommended to create an acceptable hydropower forecasting model with LSTM.

【5】 PluGeN: Multi-Label Conditional Generation From Pre-Trained Models
标题：PluGeN：基于预训练模型的多标签条件生成
链接：https://arxiv.org/abs/2109.09011

作者：Maciej Wołczyk,Magdalena Proszewska,Łukasz Maziarka,Maciej Zięba,Patryk Wielopolski,Rafał Kurczab,Marek Śmieja
机构：Jagiellonian University,Wroclaw University of Science and Technology, Institute of Pharmacology PAS
摘要：现代生成模型在包括图像或文本生成和化学分子建模在内的各种任务中实现了优异的质量。然而，现有的方法通常缺乏生成具有所需属性的示例的基本能力，例如照片中人物的年龄或生成的分子的重量。加入这些额外的条件因素需要重新构建整个体系结构并从头优化参数。此外，很难对选定属性进行分离，以便只对一个属性进行编辑，而其他属性保持不变。为了克服这些局限性，我们提出了PluGeN（plugingenerativenetwork），这是一种简单而有效的生成技术，可以用作预先训练生成模型的插件。我们的方法背后的思想是使用基于流的模块将纠缠潜在表示转换为多维空间，其中每个属性的值被建模为独立的一维分布。因此，PluGeN可以生成具有所需属性的新样本，以及操作现有样本的标记属性。由于潜在表示的分离，我们甚至能够生成数据集中罕见或看不见的属性组合的样本，例如头发灰白的年轻人、化妆的男人或留胡子的女人。我们将PluGeN与GAN和VAE模型相结合，并将其应用于图像的条件生成和操作以及化学分子建模。实验表明，PluGeN保留了主干模型的质量，同时增加了控制标记属性值的能力。
摘要：Modern generative models achieve excellent quality in a variety of tasks including image or text generation and chemical molecule modeling. However, existing methods often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. Incorporating such additional conditioning factors would require rebuilding the entire architecture and optimizing the parameters from scratch. Moreover, it is difficult to disentangle selected attributes so that to perform edits of only one attribute while leaving the others unchanged. To overcome these limitations we propose PluGeN (Plugin Generative Network), a simple yet effective generative technique that can be used as a plugin to pre-trained generative models. The idea behind our approach is to transform the entangled latent representation using a flow-based module into a multi-dimensional space where the values of each attribute are modeled as an independent one-dimensional distribution. In consequence, PluGeN can generate new samples with desired attributes as well as manipulate labeled attributes of existing examples. Due to the disentangling of the latent representation, we are even able to generate samples with rare or unseen combinations of attributes in the dataset, such as a young person with gray hair, men with make-up, or women with beards. We combined PluGeN with GAN and VAE models and applied it to conditional generation and manipulation of images and chemical molecule modeling. Experiments demonstrate that PluGeN preserves the quality of backbone models while adding the ability to control the values of labeled attributes.

【6】 Manifold-preserved GANs
标题：流形保存的Gans
链接：https://arxiv.org/abs/2109.08955

作者：Haozhe Liu,Hanbang Liang,Xianxu Hou,Haoqian Wu,Feng Liu,Linlin Shen
机构： ShenzhenInstitute of Artificial Intelligence and Robotics for Society, China; GuangdongKey Laboratory of Intelligent Information Processing, Shenzhen University
摘要：生成性对抗网络（generativediscountary Networks，GANs）已被广泛应用于各个领域。然而，现有的GAN通常无法保留数据空间的流形，这主要是由于对真实/生成数据的鉴别器的简单表示。为了解决这些开放性的挑战，本文提出了流形保持的GANs（MaF GANs），它将Wasserstein GANs推广到高维形式。具体来说，为了改进数据的表示，MaF-GANs中的鉴别器被设计为将数据映射到高维流形中。此外，为了稳定MaF-GANs的训练，提出了一种对任意K-Lipschitz连续性具有精确普适解的操作，称为拓扑一致性。理论分析和实证结果都证明了该方法的有效性。在CelebA（256*256）上采用DCGAN作为主干时，该方法实现了12.43 FID，大大优于Realness GAN（23.51 FID）等先进模型。代码将公开提供。
摘要：Generative Adversarial Networks (GANs) have been widely adopted in various fields. However, existing GANs generally are not able to preserve the manifold of data space, mainly due to the simple representation of discriminator for the real/generated data. To address such open challenges, this paper proposes Manifold-preserved GANs (MaF-GANs), which generalize Wasserstein GANs into high-dimensional form. Specifically, to improve the representation of data, the discriminator in MaF-GANs is designed to map data into a high-dimensional manifold. Furthermore, to stabilize the training of MaF-GANs, an operation with precise and universal solution for any K-Lipschitz continuity, called Topological Consistency is proposed. The effectiveness of the proposed method is justified by both theoretical analysis and empirical results. When adopting DCGAN as the backbone on CelebA (256*256), the proposed method achieved 12.43 FID, which outperforms the state-of-the-art model like Realness GAN (23.51 FID) by a large margin. Code will be made publicly available.

【7】 S$^3$VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
标题：S$^3$VAADA：虚拟对抗活动域自适应的子模块子集选择
链接：https://arxiv.org/abs/2109.08901

作者：Harsh Rangwani,Arihant Jain,Sumukh K Aithal,R. Venkatesh Babu
机构：Video Analytics Lab, Indian Institute of Science, Bengaluru, India
备注：ICCV 2021. Project page: this http URL
摘要：无监督域自适应（DA）方法的重点是通过对齐源域和目标域的特征而不使用目标域中的标记数据来实现最大性能。然而，在现实场景中，为一小部分目标数据获取标签可能是可行的。在这些场景中，重要的是选择信息量最大的样本进行标记，并找到一种有效的方法将它们与源数据中的现有知识相结合。为了实现这一点，我们提出了S$^3$VAADA，其中i）引入了一个新的子模块标准来选择一个信息量最大的子集进行标记，ii）通过新的改进来增强基于集群的DA过程，以有效地利用所有可用数据来改进目标泛化。我们的方法在具有不同程度域转移的数据集上始终优于竞争的最新方法。
摘要：Unsupervised domain adaptation (DA) methods have focused on achieving maximal performance through aligning features from source and target domains without using labeled data in the target domain. Whereas, in the real-world scenario's it might be feasible to get labels for a small proportion of target data. In these scenarios, it is important to select maximally-informative samples to label and find an effective way to combine them with the existing knowledge from source data. Towards achieving this, we propose S$^3$VAADA which i) introduces a novel submodular criterion to select a maximally informative subset to label and ii) enhances a cluster-based DA procedure through novel improvements to effectively utilize all the available data for improving generalization on target. Our approach consistently outperforms the competing state-of-the-art approaches on datasets with varying degrees of domain shifts.

【8】 Scenario adaptive disruption prediction study for next generation burning-plasma tokamaks
标题：下一代燃烧等离子体托卡马克情景自适应破坏预测研究
链接：https://arxiv.org/abs/2109.08956

作者：J. Zhu,C. Rea,R. S. Granetz,E. S. Marmar,K. J. Montes,R. Sweeney,R. A. Tinguely,D. L. Chen,B. Shen,B. J. Xiao,D. Humphreys,J. Barr,O. Meneghini
机构： R.S. Granetz, R.A., Tinguely, O., Meneghini, Plasma Science and Fusion Center, Massachusetts Institute of Technology, Cambridge, MA, USA, Institute of Plasma Physics, Chinese Academy of Sciences, Hefei, Anhui, China, General Atomics, San Diego, CA, USA
摘要：下一代高性能（HP）托卡马克有在大电流和大功率下发生不缓和破坏的风险。根据设备的低性能（LP）数据实现可靠的HP操作中断预测是成功的关键。在这封信中，通过探索性数据分析和在多个现有托卡马克上的专用数值实验，我们展示了托卡马克的运行状态如何影响经过训练的中断预测器的能力。首先，我们的结果表明，在同一托卡马克装置的高压区，对大量低压放电进行训练的数据驱动中断预测效果不佳，这是与这两个区中断相关的紧密相关信号分布不同的结果。其次，我们发现托卡马克之间的操作参数匹配极大地提高了跨机精度，这意味着我们的模型从无量纲物理参数（如q{95}、\beta{p}）的潜在标度中学习并从数据驱动的角度确认了这些参数在中断物理和跨机器域匹配中的重要性。最后，我们的结果表明，在没有来自目标设备的HP数据的情况下，通过将来自目标设备的LP数据与来自其他机器的HP数据相结合，可以实现目标机器HP状态的最佳预测性。这些结果为下一代托卡马克（如ITER和SPARC）提供了一种可能的破坏预测发展策略，并强调了在现有机器上发展未来托卡马克基线情景放电以收集更多相关破坏性数据的重要性。
摘要：Next generation high performance (HP) tokamaks risk damage from unmitigated disruptions at high current and power. Achieving reliable disruption prediction for a device's HP operation based on its low performance (LP) data is key to success. In this letter, through explorative data analysis and dedicated numerical experiments on multiple existing tokamaks, we demonstrate how the operational regimes of tokamaks can affect the power of a trained disruption predictor. First, our results suggest data-driven disruption predictors trained on abundant LP discharges work poorly on the HP regime of the same tokamak, which is a consequence of the distinct distributions of the tightly correlated signals related to disruptions in these two regimes. Second, we find that matching operational parameters among tokamaks strongly improves cross-machine accuracy which implies our model learns from the underlying scalings of dimensionless physics parameters like q_{95}, \beta_{p} and confirms the importance of these parameters in disruption physics and cross machine domain matching from the data-driven perspective. Finally, our results show how in the absence of HP data from the target devices, the best predictivity of the HP regime for the target machine can be achieved by combining LP data from the target with HP data from other machines. These results provide a possible disruption predictor development strategy for next generation tokamaks, such as ITER and SPARC, and highlight the importance of developing on existing machines baseline scenario discharges of future tokamaks to collect more relevant disruptive data.

半/弱/无/有监督|不确定性|主动学习(12篇)

【1】 Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes
标题：相信你的机器人！稀疏高斯过程神经网络的预测不确定性估计
链接：https://arxiv.org/abs/2109.09690

作者：Jongseok Lee,Jianxiang Feng,Matthias Humt,Marcus Müller,Rudolph Triebel
机构：Institute of Robotics and Mechatronics, German Aerospace Center (DLR), High Performance Humanoid Technologies, Karlsruhe Institute of Technology (KIT), Chair of Computer Vision and Artificial Intelligence, Technical University of Munich (TUM)
备注：12 pages, 6 figures and 1 table. Accepted at the 5th Conference on Robot Learning (CORL 2021), London, UK
摘要：本文提出了一个概率框架，以获得可靠和快速的不确定性估计的预测与深度神经网络（DNN）。我们的主要贡献是DNN与稀疏高斯过程（GPs）的实际和原则性结合。我们从理论上证明了DNNs可以看作是稀疏GPs的一种特殊情况，即GP专家的混合（MoE-GP），并设计了一种学习算法，将导出的理论应用到实践中。在两个不同机器人任务的实验中——机械手逆动力学和微型飞行器（MAV）上的目标检测——我们在Jetson TX2上展示了我们的方法在预测不确定性、改进的可扩展性和运行时效率方面的有效性。因此，我们认为我们的方法可以为具有不确定性意识的可靠快速机器人学习系统铺平道路。
摘要：This paper presents a probabilistic framework to obtain both reliable and fast uncertainty estimates for predictions with Deep Neural Networks (DNNs). Our main contribution is a practical and principled combination of DNNs with sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we devise a learning algorithm that brings the derived theory into practice. In experiments from two different robotic tasks -- inverse dynamics of a manipulator and object detection on a micro-aerial vehicle (MAV) -- we show the effectiveness of our approach in terms of predictive uncertainty, improved scalability, and run-time efficiency on a Jetson TX2. We thus argue that our approach can pave the way towards reliable and fast robot learning systems with uncertainty awareness.

【2】 Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR
标题：基于稀疏LiDAR的自监督单目深度学习
链接：https://arxiv.org/abs/2109.09628

作者：Ziyue Feng,Longlong Jing,Peng Yin,Yingli Tian,Bing Li
机构：Clemson University, The City University of New York, Carnegie Mellon University
备注：Accepted by CoRL2021
摘要：自监督单目深度预测为获得每个像素的三维位置提供了一种经济高效的解决方案。然而，现有的方法通常会导致不令人满意的精度，这对于自主机器人来说是至关重要的。在本文中，我们提出了一种新的两级网络，利用低成本的稀疏（如4波束）激光雷达来推进自监督单目密集深度学习。与现有的主要以耗时迭代后处理的方式使用稀疏激光雷达的方法不同，我们的模型融合了单目图像特征和稀疏激光雷达特征来预测初始深度图。然后，进一步设计了一个高效的前馈细化网络，在伪三维空间中实时校正这些初始深度图中的误差。大量实验表明，我们提出的模型在自监督单目深度预测和完成任务两方面都显著优于所有最先进的自监督方法以及基于稀疏激光雷达的方法。凭借精确的密集深度预测，我们的模型在KITTI排行榜上的下游任务单目3D目标检测方面比最先进的基于稀疏激光雷达的方法（伪激光雷达++）高出68%以上。
摘要：Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots. In this paper, we propose a novel two-stage network to advance the self-supervised monocular dense depth learning by leveraging low-cost sparse (e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly in a manner of time-consuming iterative post-processing, our model fuses monocular image features and sparse LiDAR features to predict initial depth maps. Then, an efficient feed-forward refine network is further designed to correct the errors in these initial depth maps in pseudo-3D space with real-time performance. Extensive experiments show that our proposed model significantly outperforms all the state-of-the-art self-supervised methods, as well as the sparse-LiDAR-based methods on both self-supervised monocular depth prediction and completion tasks. With the accurate dense depth prediction, our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection on the KITTI Leaderboard.

【3】 Modeling Annotation Uncertainty with Gaussian Heatmaps in Landmark Localization
标题：地标定位中基于高斯热图的注记不确定性建模
链接：https://arxiv.org/abs/2109.09533

作者：Franz Thaler,Christian Payer,Martin Urschler,Darko Stern
机构：Gottfried Schatz Research Center: Biophysics, Medical University of Graz, Austria, Institute of Computer Graphics and Vision, Graz University of Technology, Austria, School of Computer Science, The University of Auckland, New Zealand, Darko ˇStern
备注：Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URL
摘要：在地标定位中，由于地标标注的精确位置定义不明确，地标标注可能存在较大的观测变量，从而导致标注的不确定性。为了对训练数据集的注释模糊性进行建模，我们建议在优化过程中学习各向异性高斯参数来建模目标热图的形状。此外，我们的方法通过在推断过程中将各向异性高斯函数拟合到预测的热图来建模单个样本的预测不确定性。除了最新的结果外，我们在手部X光片和侧位头影数据集上的实验也表明高斯函数与定位精度和观察者变异性都相关。作为最后一个实验，我们通过测量预测位置不确定性对侧位头影中解剖异常分类的影响，展示了将不确定性整合到决策中的重要性。
摘要：In landmark localization, due to ambiguities in defining their exact position, landmark annotations may suffer from large observer variabilities, which result in uncertain annotations. To model the annotation ambiguities of the training dataset, we propose to learn anisotropic Gaussian parameters modeling the shape of the target heatmap during optimization. Furthermore, our method models the prediction uncertainty of individual samples by fitting anisotropic Gaussian functions to the predicted heatmaps during inference. Besides state-of-the-art results, our experiments on datasets of hand radiographs and lateral cephalograms also show that Gaussian functions are correlated with both localization accuracy and observer variability. As a final experiment, we show the importance of integrating the uncertainty into decision making by measuring the influence of the predicted location uncertainty on the classification of anatomical abnormalities in lateral cephalograms.

【4】 Unsupervised domain adaptation with non-stochastic missing data
标题：非随机缺失数据的无监督域自适应
链接：https://arxiv.org/abs/2109.09505

作者：Matthieu Kirchmeyer,Patrick Gallinari,Alain Rakotomamonjy,Amin Mantrach
机构：Mantrach, Sorbonne Universit´e, CNRS, LIP, F-, Paris, France, Criteo AI Lab, Paris, France, Universit´e de Rouen, LITIS, France, Amazon, Luxembourg
摘要：我们考虑无监督域适应（乌达河）的分类问题中存在的缺失数据在未标记的目标域。更准确地说，出于实际应用的动机，我们分析了域之间存在分布转移的情况，以及一些组件在目标域上系统地缺失，而没有可用的监督来插补缺失的目标组件的情况。我们提出了一种生成性插补方法。插补在域不变的潜在空间中执行，并利用来自完整源域的间接监督。我们引入了一个执行联合适应、插补和分类的单一模型，在我们的假设下，该模型最小化了其目标泛化误差的上界，并且在各种有代表性的散度族（H散度，最优传输）下表现良好。此外，我们还比较了我们的自适应插补框架的目标误差和UDA分类器的“理想”目标误差，并且没有遗漏目标分量。我们的模型通过自我训练得到进一步的改进，使学习到的源类和目标类后验分布更加接近。我们在三个不同模式的数据集上进行了实验：一个经典的数字分类基准，普遍用于UDA和现实世界数字广告数据集的亚马逊产品评论数据集。我们展示了在这些数据集上联合进行适应、分类和插补的好处。
摘要：We consider unsupervised domain adaptation (UDA) for classification problems in the presence of missing data in the unlabelled target domain. More precisely, motivated by practical applications, we analyze situations where distribution shift exists between domains and where some components are systematically absent on the target domain without available supervision for imputing the missing target components. We propose a generative approach for imputation. Imputation is performed in a domain-invariant latent space and leverages indirect supervision from a complete source domain. We introduce a single model performing joint adaptation, imputation and classification which, under our assumptions, minimizes an upper bound of its target generalization error and performs well under various representative divergence families (H-divergence, Optimal Transport). Moreover, we compare the target error of our Adaptation-imputation framework and the "ideal" target error of a UDA classifier without missing target components. Our model is further improved with self-training, to bring the learned source and target class posterior distributions closer. We perform experiments on three families of datasets of different modalities: a classical digit classification benchmark, the Amazon product reviews dataset both commonly used in UDA and real-world digital advertising datasets. We show the benefits of jointly performing adaptation, classification and imputation on these datasets.

【5】 Deep Quantile Regression for Uncertainty Estimation in Unsupervised and Supervised Lesion Detection
标题：深度分位数回归在无监督和有监督病变检测中的不确定性估计
链接：https://arxiv.org/abs/2109.09374

作者：Haleh Akrami,Anand Joshi,Sergul Aydore,Richard Leahy
机构：Department of Biomedical Engineering, University of Southern California, Los Angeles, USA, Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, USA, Amazon Web Services, New York, USA
备注：arXiv admin note: substantial text overlap with arXiv:2010.09042
摘要：尽管在多种应用中，在各种各样的机器学习任务上都有令人印象深刻的最新表现，但深度学习方法可以产生过于自信的预测，特别是在训练数据有限的情况下。因此，在异常或病变检测和临床诊断等关键应用中，量化不确定性尤为重要，在这些应用中，对不确定性的现实评估对于确定手术切缘、疾病状态和适当治疗至关重要。在这项工作中，我们着重于使用分位数回归来估计任意不确定性，并使用它来估计监督和非监督病变检测问题中的不确定性。在无监督的情况下，我们使用变分自动编码器（VAE）将分位数回归应用于病变检测任务。VAE将输出建模为条件独立的高斯分布，其特征是每个输出维度的均值和方差。不幸的是，VAE中均值和方差的联合优化导致了众所周知的收缩或方差低估问题。我们描述了另一种VAE模型分位数回归VAE（QR-VAE），该模型通过估计给定输入图像的条件分位数来避免这种方差收缩问题。使用估计的分位数，我们计算条件高斯模型下输入图像的条件均值和方差。然后，我们使用该模型计算重构概率，作为异常值或异常检测应用的原则性方法。在监督设置中，我们开发了用于监督病变分割任务的二元分位数回归（BQR）。BQR分割可以捕获标签边界的不确定性。我们展示了如何使用分位数回归来描述专家在病变边界位置上的分歧。
摘要：Despite impressive state-of-the-art performance on a wide variety of machine learning tasks in multiple applications, deep learning methods can produce over-confident predictions, particularly with limited training data. Therefore, quantifying uncertainty is particularly important in critical applications such as anomaly or lesion detection and clinical diagnosis, where a realistic assessment of uncertainty is essential in determining surgical margins, disease status and appropriate treatment. In this work, we focus on using quantile regression to estimate aleatoric uncertainty and use it for estimating uncertainty in both supervised and unsupervised lesion detection problems. In the unsupervised settings, we apply quantile regression to a lesion detection task using Variational AutoEncoder (VAE). The VAE models the output as a conditionally independent Gaussian characterized by means and variances for each output dimension. Unfortunately, joint optimization of both mean and variance in the VAE leads to the well-known problem of shrinkage or underestimation of variance. We describe an alternative VAE model, Quantile-Regression VAE (QR-VAE), that avoids this variance shrinkage problem by estimating conditional quantiles for the given input image. Using the estimated quantiles, we compute the conditional mean and variance for input images under the conditionally Gaussian model. We then compute reconstruction probability using this model as a principled approach to outlier or anomaly detection applications. In the supervised setting, we develop binary quantile regression (BQR) for the supervised lesion segmentation task. BQR segmentation can capture uncertainty in label boundaries. We show how quantile regression can be used to characterize expert disagreement in the location of lesion boundaries.

【6】 Unsupervised Continual Learning in Streaming Environments
标题：流媒体环境下的无监督连续学习
链接：https://arxiv.org/abs/2109.09282

作者：Andri Ashfahani,Mahardhika Pratama
机构： Both of themare with the School of Computer Science and Engineering, NanyangTechnologicalUniversity
备注：under review in IEEE journals
摘要：数据流需要一个深度聚类网络，因为它能够提取自然特征，从而绕过费力的特征工程步骤。尽管流媒体环境中深度网络的自动构建仍然是一个有待解决的问题，但数据流昂贵的标记成本也阻碍了这一问题的解决，从而导致对无监督方法的需求不断增加。本文提出了一种通过同时进行深度学习和聚类来动态构建深度聚类网络的无监督方法，称为自治深度聚类网络（ADCN）。它结合了特征提取层和自治全连接层，其中网络宽度和深度都是基于重建损失的偏差方差分解从数据流中自演化而来的。在每个全连通层的深嵌入空间中执行自聚类机制，通过聚类预测得分的总和推断最终输出。此外，还引入了基于潜在正则化的方法来解决灾难性遗忘问题。一项严格的数值研究表明，与同类产品相比，ADCN产生了更好的性能，同时在流媒体环境中提供了完全自主的ADCN结构构建，而没有任何用于模型更新的标记样本。为了支持可复制的研究计划，ADCN的代码、补充材料和原始结果可在\url中获得{https://tinyurl.com/AutonomousDCN}.
摘要：A deep clustering network is desired for data streams because of its aptitude in extracting natural features thus bypassing the laborious feature engineering step. While automatic construction of the deep networks in streaming environments remains an open issue, it is also hindered by the expensive labeling cost of data streams rendering the increasing demand for unsupervised approaches. This paper presents an unsupervised approach of deep clustering network construction on the fly via simultaneous deep learning and clustering termed Autonomous Deep Clustering Network (ADCN). It combines the feature extraction layer and autonomous fully connected layer in which both network width and depth are self-evolved from data streams based on the bias-variance decomposition of reconstruction loss. The self-clustering mechanism is performed in the deep embedding space of every fully connected layer while the final output is inferred via the summation of cluster prediction score. Further, a latent-based regularization is incorporated to resolve the catastrophic forgetting issue. A rigorous numerical study has shown that ADCN produces better performance compared to its counterparts while offering fully autonomous construction of ADCN structure in streaming environments with the absence of any labeled samples for model updates. To support the reproducible research initiative, codes, supplementary material, and raw results of ADCN are made available in \url{https://tinyurl.com/AutonomousDCN}.

【7】 A Study of the Generalizability of Self-Supervised Representations
标题：关于自监督表示的泛化问题的研究
链接：https://arxiv.org/abs/2109.09150

作者：Atharva Tendle,Mohammad Rashedul Hasan
机构：Department of Computer Science and Engineering, University of Nebraska-Lincoln, NE, USA, ∗Corresponding author, Preprint submitted to Machine Learning Applications, arXiv:,.,v, [cs.LG] , Sep
备注：None
摘要：自监督学习（SSL）的最新进展使得从未标记的数据中学习可概括的视觉表示成为可能。深度学习模型的性能在预先训练的SSL表示上是微调的，与基于最先进的监督学习（SL）表示的模型进行了微调。不管SSL取得了什么进展，它的普遍性还没有得到广泛的研究。在本文中，我们通过对迁移学习分类任务进行基于领域的研究，对预训练SSL和SL表示的可推广性进行了更深入的分析。表示从ImageNet源数据中学习，然后使用两种类型的目标数据集对其进行微调：与源数据集相似，与源数据集显著不同。我们通过预测精度和预测置信度来研究基于SSL和SL的模型的可推广性。除此之外，我们还分析了这些模型的最终卷积层的属性，以了解它们是如何对数据的语义标识进行推理的。我们表明，与SL表示相比，SSL表示更具普遍性。我们通过研究SSL表示的不变性来解释SSL表示的可推广性，这被证明比SL表示中观察到的更好。
摘要：Recent advancements in self-supervised learning (SSL) made it possible to learn generalizable visual representations from unlabeled data. The performance of Deep Learning models fine-tuned on pretrained SSL representations is on par with models fine-tuned on the state-of-the-art supervised learning (SL) representations. Irrespective of the progress made in SSL, its generalizability has not been studied extensively. In this article, we perform a deeper analysis of the generalizability of pretrained SSL and SL representations by conducting a domain-based study for transfer learning classification tasks. The representations are learned from the ImageNet source data, which are then fine-tuned using two types of target datasets: similar to the source dataset, and significantly different from the source dataset. We study generalizability of the SSL and SL-based models via their prediction accuracy as well as prediction confidence. In addition to this, we analyze the attribution of the final convolutional layer of these models to understand how they reason about the semantic identity of the data. We show that the SSL representations are more generalizable as compared to the SL representations. We explain the generalizability of the SSL representations by investigating its invariance property, which is shown to be better than that observed in the SL representations.

【8】 A framework for benchmarking uncertainty in deep regression
标题：深度回归中不确定性的基准测试框架
链接：https://arxiv.org/abs/2109.09048

作者：Franko Schmähling,Jörg Martin,Clemens Elster
机构：Physikal.-Techn. Bundesanstalt Braunschweig & Berlin, Germany., †These authors contributed equally to this work.
摘要：我们提出了一个评估深度回归中不确定性量化的框架。该框架基于回归问题，其中回归函数是非线性函数的线性组合。基本上，任何复杂程度都可以通过选择非线性函数及其域的维数来实现。将深度回归的不确定性量化结果与通过统计参考方法获得的结果进行比较。参考方法利用基础非线性函数的知识，并基于使用参考先验的贝叶斯线性回归。根据覆盖概率评估不确定性量化的可靠性，并通过计算不确定性的大小评估准确性。我们通过将其应用于深度回归中不确定性量化的当前方法来说明所提出的框架。这种灵活性，加上参考解决方案的可用性，使得该框架适合于定义用于不确定性量化的基准集。
摘要：We propose a framework for the assessment of uncertainty quantification in deep regression. The framework is based on regression problems where the regression function is a linear combination of nonlinear functions. Basically, any level of complexity can be realized through the choice of the nonlinear functions and the dimensionality of their domain. Results of an uncertainty quantification for deep regression are compared against those obtained by a statistical reference method. The reference method utilizes knowledge of the underlying nonlinear functions and is based on a Bayesian linear regression using a reference prior. Reliability of uncertainty quantification is assessed in terms of coverage probabilities, and accuracy through the size of calculated uncertainties. We illustrate the proposed framework by applying it to current approaches for uncertainty quantification in deep regression. The flexibility, together with the availability of a reference solution, makes the framework suitable for defining benchmark sets for uncertainty quantification.

【9】 Intra-Inter Subject Self-supervised Learning for Multivariate Cardiac Signals
标题：多变量心脏信号的受试者间自监督学习
链接：https://arxiv.org/abs/2109.08908

作者：Xiang Lan,Dianwen Ng,Shenda Hong,Mengling Feng
机构：Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Institute of Data Science, National University of Singapore, Singapore, School of Computer Science and Engineering, Nanyang Technological University, Singapore
备注：preliminary version
摘要：从未标记的多变量心脏信号中有效地学习信息丰富且可概括的表示，以识别异常心律（心律失常）在现实世界的临床环境中是有价值的，但由于其复杂的时间动力学，往往具有挑战性。即使对于同一患者，心律失常的时间模式也可能存在显著差异（$，受试者内部差异）。同时，由于心脏结构的不同，同一类型的心律失常在不同患者中表现出不同的时间模式（即，受试者之间的差异）。在本文中，我们提出了一种针对多变量心脏信号定制的受试者内-受试者间自我监督学习（ISL）模型，以应对这些挑战。我们提出的ISL模型将医学知识整合到自我监督中，以有效地从学科内和学科间的差异中学习。在受试者内部自我监控中，ISL模型首先使用通道式注意CNN-RNN编码器从每个受试者提取心跳水平特征。然后使用平稳性测试模块捕获心跳之间的时间依赖关系。在跨学科自我监控中，我们根据心脏信号的临床特征设计了一套数据扩充，并在受试者之间进行对比学习，以学习不同类型患者的不同表征。在三个真实数据集上进行了广泛的实验。在半监督迁移学习场景中，当只有1%的标记数据可用时，我们的预训练ISL模型比监督训练提高了约10%，表明该模型具有很强的泛化性和鲁棒性。
摘要：Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient ($i.e.$, intra subject difference). Meanwhile, the same type of cardiac arrhythmia can show different temporal patterns among different patients due to different cardiac structures ($i.e.$, inter subject difference). In this paper, we address the challenges by proposing an Intra-inter Subject self-supervised Learning (ISL) model that is customized for multivariate cardiac signals. Our proposed ISL model integrates medical knowledge into self-supervision to effectively learn from intra-inter subject differences. In intra subject self-supervision, ISL model first extracts heartbeat-level features from each subject using a channel-wise attentional CNN-RNN encoder. Then a stationarity test module is employed to capture the temporal dependencies between heartbeats. In inter subject self-supervision, we design a set of data augmentations according to the clinical characteristics of cardiac signals and perform contrastive learning among subjects to learn distinctive representations for various types of patients. Extensive experiments on three real-world datasets were conducted. In a semi-supervised transfer learning scenario, our pre-trained ISL model leads about 10% improvement over supervised training when only 1% labeled data is available, suggesting strong generalizability and robustness of the model.

【10】 Primary Tumor and Inter-Organ Augmentations for Supervised Lymph Node Colon Adenocarcinoma Metastasis Detection
标题：原发肿瘤和器官间增强检查在监测结肠腺癌淋巴结转移中的应用
链接：https://arxiv.org/abs/2109.09518

作者：Apostolia Tsirikoglou,Karin Stacke,Gabriel Eilertsen,Jonas Unger
机构： Department of Science and Technology, Linkoping University, Sweden, Center for Medical Image Science and Visualization, Linkoping University, Sweden, Sectra AB, Sweden
备注：International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021
摘要：标记数据的稀缺性是为组织病理学应用开发准确、健壮的基于深度学习的模型的主要瓶颈。这一问题在淋巴结转移检测任务中尤为突出，因为该组织的肿瘤与非肿瘤比率较低，导致病理学家需要耗费大量人力和时间进行注释。这项工作探索了当目标域的表示有限或没有表示时，如何增加用于结肠癌转移检测的训练数据的替代方法。通过在有限的训练数据可用性下对交叉验证实验的详尽研究，我们评估了利用其他组织已有数据的器官间方法和利用原发肿瘤的器官内方法。这两种方法几乎不需要额外的注释工作。我们的结果表明，这些数据增强策略可以有效地提高转移检测的准确性，但最重要的是提高了鲁棒性。
摘要：The scarcity of labeled data is a major bottleneck for developing accurate and robust deep learning-based models for histopathology applications. The problem is notably prominent for the task of metastasis detection in lymph nodes, due to the tissue's low tumor-to-non-tumor ratio, resulting in labor- and time-intensive annotation processes for the pathologists. This work explores alternatives on how to augment the training data for colon carcinoma metastasis detection when there is limited or no representation of the target domain. Through an exhaustive study of cross-validated experiments with limited training data availability, we evaluate both an inter-organ approach utilizing already available data for other tissues, and an intra-organ approach, utilizing the primary tumor. Both these approaches result in little to no extra annotation effort. Our results show that these data augmentation strategies can be an efficient way of increasing accuracy on metastasis detection, but fore-most increase robustness.

【11】 Self-supervised learning methods and applications in medical imaging analysis: A survey
标题：自我监督学习方法及其在医学影像分析中的应用综述
链接：https://arxiv.org/abs/2109.08685

作者：Saeed Shurrab,Rehab Duwiari
机构：Department of Computer Information Systems, Jordan University of Science and Technology, Irbid
摘要：高质量带注释的医学影像数据集的可用性是一个与机器学习在医学影像分析领域的应用相冲突并阻碍其发展的主要问题。自监督学习是一种最新的训练范式，它可以在不需要人工标注的情况下学习鲁棒表示，这可以被认为是解决标注医学数据稀缺性的有效方法。本文综述了图像数据自监督学习方法的最新研究方向，重点介绍了其在医学图像分析领域中的应用。本文介绍了一套最新的计算机视觉领域的自监督学习方法，因为它们适用于医学影像分析，并将其分类为预测、生成和对比方法。此外，本文还介绍了医学影像分析中自我监督学习领域的（40）项最新研究，旨在阐明该领域的最新创新。最后，文章总结了该领域未来可能的研究方向。
摘要：The availability of high quality annotated medical imaging datasets is a major problem that collides with machine learning applications in the field of medical imaging analysis and impedes its advancement. Self-supervised learning is a recent training paradigm that enables learning robust representations without the need for human annotation which can be considered as an effective solution for the scarcity in annotated medical data. This article reviews the state-of-the-art research directions in self-supervised learning approaches for image data with concentration on their applications in the field of medical imaging analysis. The article covers a set of the most recent self-supervised learning methods from the computer vision field as they are applicable to the medical imaging analysis and categorize them as predictive, generative and contrastive approaches. Moreover, the article covers (40) of the most recent researches in the field of self-supervised learning in medical imaging analysis aiming at shedding the light on the recent innovation in the field. Ultimately, the article concludes with possible future research directions in the field.

【12】 Hebbian Semi-Supervised Learning in a Sample Efficiency Setting
标题：样本有效性设置下的Hebbian半监督学习
链接：https://arxiv.org/abs/2103.09002

作者：Gabriele Lagani,Fabrizio Falchi,Claudio Gennaro,Giuseppe Amato
机构：Computer Science Department, University of Pisa, Pisa, Italy, ISTI - CNR, Pisa, Italy
备注：None
摘要：我们建议通过结合Hebbian学习和梯度下降的半监督训练策略解决深度卷积神经网络（DCNN）中的样本效率问题：使用基于Hebbian学习的无监督方法对所有内部层（卷积层和完全连接层）进行预训练，最后一个全连通层（分类层）采用随机梯度下降法（SGD）进行训练。事实上，由于Hebbian学习是一种无监督的学习方法，它的潜力在于可以在没有标签的情况下训练DCNN的内部层。只有最后一个完全连接的层必须使用标记的示例进行训练。我们在不同的样本效率下，在不同的对象识别数据集上进行了实验，比较了我们的半监督（Hebbian用于内部层+SGD用于最终完全连接层）方法与端到端监督backprop训练，以及基于变分自动编码器（VAE）的半监督学习。结果表明，在可用标记样本数较低的情况下，我们的半监督方法在几乎所有情况下都优于其他方法。
摘要：We propose to address the issue of sample efficiency, in Deep Convolutional Neural Networks (DCNN), with a semi-supervised training strategy that combines Hebbian learning with gradient descent: all internal layers (both convolutional and fully connected) are pre-trained using an unsupervised approach based on Hebbian learning, and the last fully connected layer (the classification layer) is trained using Stochastic Gradient Descent (SGD). In fact, as Hebbian learning is an unsupervised learning method, its potential lies in the possibility of training the internal layers of a DCNN without labels. Only the final fully connected layer has to be trained with labeled examples. We performed experiments on various object recognition datasets, in different regimes of sample efficiency, comparing our semi-supervised (Hebbian for internal layers + SGD for the final fully connected layer) approach with end-to-end supervised backprop training, and with semi-supervised learning based on Variational Auto-Encoder (VAE). The results show that, in regimes where the number of available labeled samples is low, our semi-supervised approach outperforms the other approaches in almost all the cases.

迁移|Zero/Few/One-Shot|自适应(4篇)

【1】 Few-Shot Emotion Recognition in Conversation with Sequential Prototypical Networks
标题：基于序列原型网络的会话中少发情感识别
链接：https://arxiv.org/abs/2109.09366

作者：Gaël Guibon,Matthieu Labeau,Hélène Flamein,Luce Lefeuvre,Chloé Clavel
机构：LTCI, Télécom-Paris, Institut Polytechnique de Paris, Direction Innovation & Recherche SNCF
备注：None
摘要：最近几项关于二元人际互动的研究都是针对没有特定商业目标的对话进行的。然而，许多公司可能受益于专门针对更精确环境的研究，如售后服务或客户满意度调查。在这项工作中，我们将自己置于实时聊天客户服务的范围内，我们希望在其中检测谈话流中的情绪及其演变。这一背景带来了多重挑战，从利用受限的、小的且大部分未标记的数据集到寻找和适应此类背景的方法。我们通过使用少量镜头学习来应对这些挑战，同时假设它可以为不同语言和稀疏标签的会话情感分类服务。我们提出了一种用于会话序列标记的原型网络变体，我们将其命名为ProtoSeq。我们在两个不同语言的数据集上测试了这种方法：英语的日常会话和法语的客户服务聊天。当应用于会话中的情感分类时，我们的方法被证明是有竞争力的，即使与其他方法相比也是如此。
摘要：Several recent studies on dyadic human-human interactions have been done on conversations without specific business objectives. However, many companies might benefit from studies dedicated to more precise environments such as after sales services or customer satisfaction surveys. In this work, we place ourselves in the scope of a live chat customer service in which we want to detect emotions and their evolution in the conversation flow. This context leads to multiple challenges that range from exploiting restricted, small and mostly unlabeled datasets to finding and adapting methods for such context.We tackle these challenges by using Few-Shot Learning while making the hypothesis it can serve conversational emotion classification for different languages and sparse labels. We contribute by proposing a variation of Prototypical Networks for sequence labeling in conversation that we name ProtoSeq. We test this method on two datasets with different languages: daily conversations in English and customer service chat conversations in French. When applied to emotion classification in conversations, our method proved to be competitive even when compared to other ones.

【2】 Ontology-based n-ball Concept Embeddings Informing Few-shot Image Classification
标题：基于本体的n球概念嵌入通知Few-Shot图像分类
链接：https://arxiv.org/abs/2109.09063

作者：Mirantha Jayathilaka,Tingting Mu,Uli Sattler
机构：Department of Computer Science, The University of Manchester, UK
备注：None
摘要：我们提出了一个新的框架ViOCE，它将基于本体的背景知识以$n$-ball概念的形式嵌入到基于神经网络的视觉体系结构中。该方法由两部分组成：通过学习n-球嵌入将本体的符号知识转换为连续空间，该嵌入捕获包容和不相交属性，并使用所学嵌入指导视觉模型的训练和推理。我们使用Few-Shot图像分类任务来评估ViOCE，在两个标准基准上，ViOCE表现出了优异的性能。
摘要：We propose a novel framework named ViOCE that integrates ontology-based background knowledge in the form of $n$-ball concept embeddings into a neural network based vision architecture. The approach consists of two components - converting symbolic knowledge of an ontology into continuous space by learning n-ball embeddings that capture properties of subsumption and disjointness, and guiding the training and inference of a vision model using the learnt embeddings. We evaluate ViOCE using the task of few-shot image classification, where it demonstrates superior performance on two standard benchmarks.

【3】 Augmenting semantic lexicons using word embeddings and transfer learning
标题：利用词嵌入和迁移学习扩充语义词典
链接：https://arxiv.org/abs/2109.09010

作者：Thayer Alshaabi,Colin Van Oort,Mikaela Fudolig,Michael V. Arnold,Christopher M. Danforth,Peter Sheridan Dodds
机构：Vermont Complex Systems Center, University of Vermont, Burlington, VT ,., Department of Computer Science, University of Vermont, Burlington, VT ,., Department of Mathematics & Statistics, University of Vermont, Burlington, VT ,., )
备注：16 pages, 9 figures
摘要：情感感知智能系统对于广泛的应用至关重要，包括营销、政治活动、推荐系统、行为经济学、社会心理学和国家安全。这些情感感知智能系统由语言模型驱动，语言模型大致分为两种范式：1。以词汇为基础的教学方法。上下文的。尽管最近的上下文模型越来越占主导地位，但由于其可解释性和易用性，我们仍然看到对基于词典的模型的需求。例如，基于词典的模型使研究人员能够轻松确定哪些单词和短语对测量情绪的变化贡献最大。任何基于词典的方法都面临一个挑战，那就是词典需要经常用新词和表达来扩展。语义词典的众包注释可能是一项昂贵且耗时的任务。在这里，我们提出了两个预测情感分数的模型，以相对较低的成本使用单词嵌入和迁移学习来增加语义词汇。我们的第一个模型使用一个简单的浅层神经网络建立了一个基线，该网络使用一种非上下文的方法，通过预先训练的单词嵌入进行初始化。我们的第二个模型改进了我们的基线，具有一个基于深度转换器的网络，该网络带来了单词定义，以估计它们的词汇极性。我们的评估表明，这两种模型能够以与Amazon Mechanical Turk的审稿人相似的准确度对新词进行评分，但成本仅为后者的一小部分。
摘要：Sentiment-aware intelligent systems are essential to a wide array of applications including marketing, political campaigns, recommender systems, behavioral economics, social psychology, and national security. These sentiment-aware intelligent systems are driven by language models which broadly fall into two paradigms: 1. Lexicon-based and 2. Contextual. Although recent contextual models are increasingly dominant, we still see demand for lexicon-based models because of their interpretability and ease of use. For example, lexicon-based models allow researchers to readily determine which words and phrases contribute most to a change in measured sentiment. A challenge for any lexicon-based approach is that the lexicon needs to be routinely expanded with new words and expressions. Crowdsourcing annotations for semantic dictionaries may be an expensive and time-consuming task. Here, we propose two models for predicting sentiment scores to augment semantic lexicons at a relatively low cost using word embeddings and transfer learning. Our first model establishes a baseline employing a simple and shallow neural network initialized with pre-trained word embeddings using a non-contextual approach. Our second model improves upon our baseline, featuring a deep Transformer-based network that brings to bear word definitions to estimate their lexical polarity. Our evaluation shows that both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.

【4】 Towards Zero and Few-shot Knowledge-seeking Turn Detection in Task-orientated Dialogue Systems
标题：任务型对话系统中面向零和Few-Shot的知识寻求转向检测
链接：https://arxiv.org/abs/2109.08820

作者：Di Jin,Shuyang Gao,Seokhwan Kim,Yang Liu,Dilek Hakkani-Tur
机构：Amazon Alexa AI, Sunnyvale, CA, USA
备注：To appear at NLP4ConvAI workshop of EMNLP 2021
摘要：大多数以前关于面向任务的对话系统的工作仅限于支持领域API。但是，用户可能有超出这些API范围的请求。这项工作的重点是识别这样的用户请求。现有的这项任务的方法主要依赖于对大型带注释数据的预训练模型进行微调。我们提出了一种基于自适应表示学习和密度估计的新方法REDE。REDE可以应用于零炮点情况，通过更新小于3K的参数，只需几炮点即可快速学习高性能探测器。我们在DSTC9数据和我们新收集的测试集上展示了REDE的竞争性能。
摘要：Most prior work on task-oriented dialogue systems is restricted to supporting domain APIs. However, users may have requests that are out of the scope of these APIs. This work focuses on identifying such user requests. Existing methods for this task mainly rely on fine-tuning pre-trained models on large annotated data. We propose a novel method, REDE, based on adaptive representation learning and density estimation. REDE can be applied to zero-shot cases, and quickly learns a high-performing detector with only a few shots by updating less than 3K parameters. We demonstrate REDE's competitive performance on DSTC9 data and our newly collected test set.

强化学习(8篇)

【1】 Deep Reinforcement Learning Based Multidimensional Resource Management for Energy Harvesting Cognitive NOMA Communications
标题：基于深度强化学习的能量采集认知NOMA通信多维资源管理
链接：https://arxiv.org/abs/2109.09503

作者：Zhaoyuan Shi,Xianzhong Xie,Huabing Lu,Helin Yang,Jun Cai,Zhiguo Ding
机构： Nanyang Technological University
备注：35 pages, 12 figures
摘要：能量收集（EH）、认知无线电（CR）和非正交多址（NOMA）的结合是提高即将到来的第五代网络（B5G）的能量效率和频谱效率的一个有希望的解决方案，特别是在支持物联网（IoT）系统中的无线传感器通信方面。然而，如何实现智能频率、时间和能量资源分配以支持更好的性能是一个需要解决的重要问题。在本文中，我们研究了EH-CR-NOMA物联网系统的联合频谱、能量和时间资源管理。我们的目标是在满足主用户（PU）和SSU的最大充电电池容量、最大发射功率、最大缓冲容量和最小数据速率的约束条件下，使所有辅助传感用户（SSU）的数据包丢失数量最小化。由于该优化问题的非凸性和无线环境的随机性，我们提出了一种基于深度强化学习（DRL）的分布式多维资源管理算法。考虑到所管理资源的连续性，采用了深度确定性策略梯度（DDPG）算法，基于该算法，每个代理（SSU）可以在不协作的情况下管理自己的多维资源。此外，为了提高训练效率和电池性能保护，还引入了一种简化但实用的动作调节器（AA）。结果表明，该算法的收敛速度比DDPG算法快4倍左右，平均丢包数（ANPL）比贪婪算法低8倍左右。
摘要：The combination of energy harvesting (EH), cognitive radio (CR), and non-orthogonal multiple access (NOMA) is a promising solution to improve energy efficiency and spectral efficiency of the upcoming beyond fifth generation network (B5G), especially for support the wireless sensor communications in Internet of things (IoT) system. However, how to realize intelligent frequency, time, and energy resource allocation to support better performances is an important problem to be solved. In this paper, we study joint spectrum, energy, and time resource management for the EH-CR-NOMA IoT systems. Our goal is to minimize the number of data packets losses for all secondary sensing users (SSU), while satisfying the constraints on the maximum charging battery capacity, maximum transmitting power, maximum buffer capacity, and minimum data rate of primary users (PU) and SSUs. Due to the non-convexity of this optimization problem and the stochastic nature of the wireless environment, we propose a distributed multidimensional resource management algorithm based on deep reinforcement learning (DRL). Considering the continuity of the resources to be managed, the deep deterministic policy gradient (DDPG) algorithm is adopted, based on which each agent (SSU) can manage its own multidimensional resources without collaboration. In addition, a simplified but practical action adjuster (AA) is introduced for improving the training efficiency and battery performance protection. The provided results show that the convergence speed of the proposed algorithm is about 4 times faster than that of DDPG, and the average number of packet losses (ANPL) is about 8 times lower than that of the greedy algorithm.

【2】 Lifelong Robotic Reinforcement Learning by Retaining Experiences
标题：保留经验的终身机器人强化学习
链接：https://arxiv.org/abs/2109.09180

作者：Annie Xie,Chelsea Finn
机构： 1Stanford University
备注：Supplementary website at this https URL
摘要：理想情况下，多任务学习允许机器人获得多种有用技能。然而，许多多任务强化学习工作假设机器人可以随时从所有任务收集数据。实际上，机器人学习的任务是按顺序到达的，这取决于用户和机器人的当前环境。在这项工作中，我们研究了一个实际的顺序多任务RL问题，该问题是由物理机器人系统的实际约束所驱动的，并推导出了一种有效地利用为先前任务学习的数据和策略来累积增长机器人技能集的方法。在一系列模拟机器人操作实验中，我们的方法需要的样本少于从头开始学习每个任务的一半，同时避免了不切实际的循环数据收集。在Franka Emika熊猫机器人手臂上，我们的方法逐步学习十项具有挑战性的任务，包括瓶盖和块插入。
摘要：Multi-task learning ideally allows robots to acquire a diverse repertoire of useful skills. However, many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times. In reality, the tasks that the robot learns arrive sequentially, depending on the user and the robot's current environment. In this work, we study a practical sequential multi-task RL problem that is motivated by the practical constraints of physical robotic systems, and derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set. In a series of simulated robotic manipulation experiments, our approach requires less than half the samples than learning each task from scratch, while avoiding impractical round-robin data collection. On a Franka Emika Panda robot arm, our approach incrementally learns ten challenging tasks, including bottle capping and block insertion.

【3】 Regularize! Don't Mix: Multi-Agent Reinforcement Learning without Explicit Centralized Structures
标题：正规化！不要混用：没有显式集中结构的多Agent强化学习
链接：https://arxiv.org/abs/2109.09038

作者：Chapman Siu,Jason Traish,Richard Yi Da Xu
机构：University of Technology Sydney, Australia
摘要：我们建议将正则化用于多智能体强化学习，而不是学习称为{\em Multi-Agent正则化Q-Learning}（MARQ）的显式合作结构。许多MARL方法利用集中式结构来利用全局状态信息，或在代理以分散方式行动时消除通信约束。我们不学习在代理执行过程中删除的冗余结构，而是建议利用代理的共享经验来规范单个策略，以促进结构化探索。我们研究了几种不同的方法，以了解MARQ如何在多代理环境中显式或隐式地规范我们的策略。MARQ旨在通过应用正则化约束来解决MARL环境中的这些限制，正则化约束可以纠正非政策分销代理经验中的偏差，并促进多样化的探索。我们的算法在几个基准多代理环境中进行了评估，结果表明MARQ始终优于几个基线和最先进的算法；以更少的步骤学习，获得更高的回报。
摘要：We propose using regularization for Multi-Agent Reinforcement Learning rather than learning explicit cooperative structures called {\em Multi-Agent Regularized Q-learning} (MARQ). Many MARL approaches leverage centralized structures in order to exploit global state information or removing communication constraints when the agents act in a decentralized manner. Instead of learning redundant structures which is removed during agent execution, we propose instead to leverage shared experiences of the agents to regularize the individual policies in order to promote structured exploration. We examine several different approaches to how MARQ can either explicitly or implicitly regularize our policies in a multi-agent setting. MARQ aims to address these limitations in the MARL context through applying regularization constraints which can correct bias in off-policy out-of-distribution agent experiences and promote diverse exploration. Our algorithm is evaluated on several benchmark multi-agent environments and we show that MARQ consistently outperforms several baselines and state-of-the-art algorithms; learning in fewer steps and converging to higher returns.

【4】 Dual Behavior Regularized Reinforcement Learning
标题：双重行为正则化强化学习
链接：https://arxiv.org/abs/2109.09037

作者：Chapman Siu,Jason Traish,Richard Yi Da Xu
机构：University of Technology Sydney, Australia
摘要：强化学习已被证明可以通过与环境的交互或收集的经验来执行一系列复杂的任务。然而，这些方法中的许多都假定了最佳或接近最佳的体验或一致环境的存在。在这项工作中，我们提出了基于反事实后悔最小化的双重、基于优势的行为策略。我们展示了这种方法的灵活性，以及它如何适应在线环境，其中环境可用于收集经验和各种其他环境。我们证明了这种新算法在基于一系列连续环境的不同环境中优于几种强基线模型。与其他合理的修改相比，额外的烧蚀提供了关于我们的双行为正则化强化学习方法是如何设计的见解，并证明了其推广能力。
摘要：Reinforcement learning has been shown to perform a range of complex tasks through interaction with an environment or collected leveraging experience. However, many of these approaches presume optimal or near optimal experiences or the presence of a consistent environment. In this work we propose dual, advantage-based behavior policy based on counterfactual regret minimization. We demonstrate the flexibility of this approach and how it can be adapted to online contexts where the environment is available to collect experiences and a variety of other contexts. We demonstrate this new algorithm can outperform several strong baseline models in different contexts based on a range of continuous environments. Additional ablations provide insights into how our dual behavior regularized reinforcement learning approach is designed compared with other plausible modifications and demonstrates its ability to generalize.

【5】 Greedy UnMixing for Q-Learning in Multi-Agent Reinforcement Learning
标题：多智能体强化学习中Q-学习的贪婪分解
链接：https://arxiv.org/abs/2109.09034

作者：Chapman Siu,Jason Traish,Richard Yi Da Xu
机构：University of Technology Sydney, Australia
摘要：本文介绍了一种用于多智能体协作强化学习（MARL）的贪婪UnMix（GUM）。贪婪的联X特派团旨在避免由于作为大型联合国家行动空间的一部分而高估价值而导致MARL方法失败的情况。它旨在通过一种保守的Q-学习方法来解决这一问题，通过限制数据集中的状态边际来避免未观察到的联合状态行动空间，同时尝试在集中式训练和分散执行范式下，对问题空间进行联调或简化。我们证明了在MARL场景的Q-学习中遵守Q-函数下界，并证明了相对于现有Q-学习MARL方法以及一组基准MARL任务的更一般的MARL算法的优越性能，尽管与最先进的方法相比相对简单。
摘要：This paper introduces Greedy UnMix (GUM) for cooperative multi-agent reinforcement learning (MARL). Greedy UnMix aims to avoid scenarios where MARL methods fail due to overestimation of values as part of the large joint state-action space. It aims to address this through a conservative Q-learning approach through restricting the state-marginal in the dataset to avoid unobserved joint state action spaces, whilst concurrently attempting to unmix or simplify the problem space under the centralized training with decentralized execution paradigm. We demonstrate the adherence to Q-function lower bounds in the Q-learning for MARL scenarios, and demonstrate superior performance to existing Q-learning MARL approaches as well as more general MARL algorithms over a set of benchmark MARL tasks, despite its relative simplicity compared with state-of-the-art approaches.

【6】 Hindsight Foresight Relabeling for Meta-Reinforcement Learning
标题：元强化学习的后见之明前瞻重新标注
链接：https://arxiv.org/abs/2109.09031

作者：Michael Wan,Jian Peng,Tanmay Gangwani
机构：University of Illinois at Urbana-Champaign
摘要：元强化学习（Meta-RL）算法允许代理从少量的经验中学习新的行为，从而缓解RL中的样本效率低下问题。然而，尽管元RL代理在经历了少量的轨迹后，可以在测试时快速适应新任务，但元训练过程仍然效率低下。先前的研究发现，在多任务RL环境中，重新标记过去的转换，从而在任务之间共享经验，可以提高样本效率和渐近性能。我们将这一思想应用到meta RL设置中，并设计了一种新的重新标记方法，称为事后预见重新标记（HFR）。我们结合“后见之明”和“预见”构建了一个重新标记分布，后见之明用于使用训练任务分布中的奖励函数重新标记轨迹，而“预见”用于获取重新标记的轨迹并计算每个任务的每条轨迹的效用。HFR易于实现，并且易于与现有的meta-RL算法兼容。我们发现，与其他重新标记方法相比，HFR在各种元RL任务上的性能都有所提高。
摘要：Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn new behaviors from small amounts of experience, mitigating the sample inefficiency problem in RL. However, while meta-RL agents can adapt quickly to new tasks at test time after experiencing only a few trajectories, the meta-training process is still sample-inefficient. Prior works have found that in the multi-task RL setting, relabeling past transitions and thus sharing experience among tasks can improve sample efficiency and asymptotic performance. We apply this idea to the meta-RL setting and devise a new relabeling method called Hindsight Foresight Relabeling (HFR). We construct a relabeling distribution using the combination of "hindsight", which is used to relabel trajectories using reward functions from the training task distribution, and "foresight", which takes the relabeled trajectories and computes the utility of each trajectory for each task. HFR is easy to implement and readily compatible with existing meta-RL algorithms. We find that HFR improves performance when compared to other relabeling methods on a variety of meta-RL tasks.

【7】 Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations
标题：分布式强化学习对噪声状态观测的鲁棒性研究
链接：https://arxiv.org/abs/2109.08776

作者：Ke Sun,Yi Liu,Yingnan Zhao,Hengshuai Yao,Shangling Jui,Linglong Kong
机构：Department of Mathematical and Statistical Sciences, University of Alberta, Department of Computer Science and Technology, Harbin Institute of Technology, Huawei Technologies, Huawei Kirin Solution
摘要：在真实场景中，特工观察到的状态观测可能包含测量误差或敌对噪音，误导特工采取次优行动，甚至在训练时崩溃。在本文中，我们研究了分布强化学习（RL）的训练稳健性，这是一类最先进的方法，用于估计总收益的整个分布，而不仅仅是期望值。首先，我们在表格情况下提出状态噪声马尔可夫决策过程（SN-MDP），将随机和敌对状态观测噪声结合在一起，其中导出了基于期望和分布Bellman算子的收缩。除了使用函数近似的SN-MDP外，我们还从理论上描述了基于直方图的分布损失的有界梯度范数，说明了分布RL具有更好的训练鲁棒性。我们还提供了在更灵活的状态噪声下更严格的时间差分（TD）学习收敛条件，以及利用影响函数进行灵敏度分析。最后，在一组博弈上的大量实验表明，在各种状态观测噪声下，分布式RL比基于期望的RL具有更好的训练鲁棒性。
摘要：In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning~(RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we propose State-Noisy Markov Decision Process~(SN-MDP) in the tabular case to incorporate both random and adversarial state observation noises, in which the contraction of both expectation-based and distributional Bellman operators is derived. Beyond SN-MDP with the function approximation, we theoretically characterize the bounded gradient norm of histogram-based distributional loss, accounting for the better training robustness of distribution RL. We also provide stricter convergence conditions of the Temporal-Difference~(TD) learning under more flexible state noises, as well as the sensitivity analysis by the leverage of influence function. Finally, extensive experiments on the suite of games show that distributional RL enjoys better training robustness compared with its expectation-based counterpart across various state observation noises.

【8】 A Reinforcement Learning Approach to the Stochastic Cutting Stock Problem
标题：随机下料问题的强化学习方法
链接：https://arxiv.org/abs/2109.09592

作者：Anselmo R. Pitombeira-Neto,Arthur H. Fonseca Murta
机构：Department of Industrial Engineering, Federal University of Ceará, Campus do Pici, Fortaleza, Brazil, Graduate Program in Modeling and Quantitative Methods, Federal University of Ceará, Fortaleza, Brazil
备注：22 pages, 6 figures
摘要：我们提出了一个将随机下料问题描述为一个折扣无限期马尔可夫决策过程的公式。在每个决策阶段，给定当前的物品库存，一个代理根据未知的需求选择在库存中切割物品的模式。最佳解决方案对应于将每个状态与决策关联并使预期总成本最小化的策略。由于精确算法与状态空间维数成指数关系，我们提出了一种基于强化学习的启发式求解方法。我们提出了一种近似策略迭代算法，其中我们应用一个线性模型来近似策略的作用值函数。通过从模拟获得的状态转换、决策和成本样本中求解预测的Bellman方程来执行政策评估。由于决策空间大，采用交叉熵方法进行策略改进。使用真实数据进行了计算实验，以说明该算法的应用。用多项式和傅里叶基函数得到的启发式策略与短视策略和随机策略进行了比较。结果表明，获得能够充分控制存货的政策的可能性，其平均成本比短视政策获得的成本低80%。
摘要：We propose a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process. At each decision epoch, given current inventory of items, an agent chooses in which patterns to cut objects in stock in anticipation of the unknown demand. An optimal solution corresponds to a policy that associates each state with a decision and minimizes the expected total cost. Since exact algorithms scale exponentially with the state-space dimension, we develop a heuristic solution approach based on reinforcement learning. We propose an approximate policy iteration algorithm in which we apply a linear model to approximate the action-value function of a policy. Policy evaluation is performed by solving the projected Bellman equation from a sample of state transitions, decisions and costs obtained by simulation. Due to the large decision space, policy improvement is performed via the cross-entropy method. Computational experiments are carried out with the use of realistic data to illustrate the application of the algorithm. Heuristic policies obtained with polynomial and Fourier basis functions are compared with myopic and random policies. Results indicate the possibility of obtaining policies capable of adequately controlling inventories with an average cost up to 80% lower than the cost obtained by a myopic policy.

符号|符号学习(1篇)

【1】 Combining Rules and Embeddings via Neuro-Symbolic AI for Knowledge Base Completion
标题：基于神经符号人工智能的规则与嵌入相结合的知识库补全
链接：https://arxiv.org/abs/2109.09566

作者：Prithviraj Sen,Breno W. S. R. Carvalho,Ibrahim Abdelaziz,Pavan Kapanipathi,Francois Luus,Salim Roukos,Alexander Gray
机构：IBM Research
摘要：最近对知识库完成（KBC）的兴趣导致了大量基于强化学习、归纳逻辑编程和图形嵌入的方法。特别是，基于规则的KBC已经产生了可解释的规则，同时在性能上与图形嵌入相当。即使在基于规则的知识库中，也存在不同的方法，导致不同质量的规则，并且以前的工作并不总是精确地强调这些差异。困扰大多数基于规则的知识库的另一个问题是关系路径的不一致性：一些关系序列出现在非常少的路径中，而另一些关系序列出现得非常频繁。在本文中，我们证明了并非所有基于规则的KBC模型都是相同的，并提出了两种不同的方法来学习一种情况：1）关系的混合和另2）路径的混合。当在神经符号AI上实现时，通过将布尔逻辑扩展到实值逻辑，学习规则，后者模型导致优于KBC的最先进的基于规则的KBC精度，平均倒数排名为2-10%。此外，为了解决关系路径的不一致性，我们将基于规则的知识库与图嵌入相结合，从而进一步改进我们的结果，并实现两个世界的最佳效果。
摘要：Recent interest in Knowledge Base Completion (KBC) has led to a plethora of approaches based on reinforcement learning, inductive logic programming and graph embeddings. In particular, rule-based KBC has led to interpretable rules while being comparable in performance with graph embeddings. Even within rule-based KBC, there exist different approaches that lead to rules of varying quality and previous work has not always been precise in highlighting these differences. Another issue that plagues most rule-based KBC is the non-uniformity of relation paths: some relation sequences occur in very few paths while others appear very frequently. In this paper, we show that not all rule-based KBC models are the same and propose two distinct approaches that learn in one case: 1) a mixture of relations and the other 2) a mixture of paths. When implemented on top of neuro-symbolic AI, which learns rules by extending Boolean logic to real-valued logic, the latter model leads to superior KBC accuracy outperforming state-of-the-art rule-based KBC by 2-10% in terms of mean reciprocal rank. Furthermore, to address the non-uniformity of relation paths, we combine rule-based KBC with graph embeddings thus improving our results even further and achieving the best of both worlds.

医学相关(11篇)

【1】 FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Future Medical Imaging
标题：未来-人工智能：未来医学影像中值得信赖的人工智能的指导原则和共识建议
链接：https://arxiv.org/abs/2109.09658

作者：Karim Lekadira,Richard Osuala,Catherine Gallin,Noussair Lazrak,Kaisar Kushibar,Gianna Tsakou,Susanna Aussó,Leonor Cerdá Alberich,Konstantinos Marias,Manolis Tskinakis,Sara Colantonio,Nickolas Papanikolaou,Zohaib Salahuddin,Henry C Woodruff,Philippe Lambin,Luis Martí-Bonmatí
机构：Artificial Intelligence in Medicine Lab (BCN-AIM), University of, Maggioli SPA, Research and Development Lab, Athens, Greece, Foundation for Research and Technology, Institute of Computer Science, Greece
备注：46 pages
摘要：人工智能（AI）的最新进展与当今临床系统产生的大量数据相结合，导致了医疗成像整个价值链上成像AI解决方案的开发，包括图像重建、医学图像分割、基于图像的诊断和治疗规划。尽管人工智能在医学成像领域取得了成功并具有未来潜力，但许多利益相关者仍关注成像人工智能解决方案的潜在风险和道德影响，这些解决方案被视为复杂、不透明，在关键临床应用中难以理解、利用和信任。尽管存在这些问题和风险，但目前还没有具体的指南和最佳实践来指导医疗成像领域人工智能的未来发展，以提高信任度、安全性和采用率。为了弥合这一差距，本文介绍了从五大欧洲健康成像人工智能项目中积累的经验、共识和最佳实践中精心选择的指导原则。这些指导原则被命名为FUTURE-AI，其构建模块包括（i）公平性，（ii）普遍性，（iii）可追溯性，（iv）可用性，（v）健壮性和（vi）可解释性。在一步一步的方法中，这些指南被进一步转化为一个具体建议框架，用于在临床实践中指定、开发、评估和部署技术上、临床上和道德上值得信赖的人工智能解决方案。
摘要：The recent advancements in artificial intelligence (AI) combined with the extensive amount of data generated by today's clinical systems, has led to the development of imaging AI solutions across the whole value chain of medical imaging, including image reconstruction, medical image segmentation, image-based diagnosis and treatment planning. Notwithstanding the successes and future potential of AI in medical imaging, many stakeholders are concerned of the potential risks and ethical implications of imaging AI solutions, which are perceived as complex, opaque, and difficult to comprehend, utilise, and trust in critical clinical applications. Despite these concerns and risks, there are currently no concrete guidelines and best practices for guiding future AI developments in medical imaging towards increased trust, safety and adoption. To bridge this gap, this paper introduces a careful selection of guiding principles drawn from the accumulated experiences, consensus, and best practices from five large European projects on AI in Health Imaging. These guiding principles are named FUTURE-AI and its building blocks consist of (i) Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Robustness and (vi) Explainability. In a step-by-step approach, these guidelines are further translated into a framework of concrete recommendations for specifying, developing, evaluating, and deploying technically, clinically and ethically trustworthy AI solutions into clinical practice.

【2】 Investigating the Relationship Between World Development Indicators and the Occurrence of Disease Outbreaks in the 21st Century: A Case Study
标题：21世纪世界发展指标与疾病暴发关系的个案研究
链接：https://arxiv.org/abs/2109.09314

作者：Aboli Marathe,Harsh Sakhrani,Saloni Parekh
机构：Dept. of Computer Engineering, Pune Institute of Computer, Pune, India, Dept. of Information Technology
摘要：及时确定易受疾病爆发影响的社会经济部门，对有兴趣采取缓解爆发措施的市政当局和医疗工作者来说是一项重大挑战。这个问题传统上是通过研究小规模医疗数据中的异常来解决的。在本文中，我们利用2000-2019年全球历史数据，利用数据驱动模型确定世界发展指标趋势与疾病爆发之间的关系，并将其视为经典的监督分类问题。基于CART的特征选择以一种非正统的方式使用，以确定受疾病爆发影响的协变量，从而确定最脆弱的部门。结果包括对不同分类算法的综合分析，并表明疾病暴发发生与各种发展指标的大小之间的关系。
摘要：The timely identification of socio-economic sectors vulnerable to a disease outbreak presents an important challenge to the civic authorities and healthcare workers interested in outbreak mitigation measures. This problem was traditionally solved by studying the aberrances in small-scale healthcare data. In this paper, we leverage data driven models to determine the relationship between the trends of World Development Indicators and occurrence of disease outbreaks using worldwide historical data from 2000-2019, and treat it as a classic supervised classification problem. CART based feature selection was employed in an unorthodox fashion to determine the covariates getting affected by the disease outbreak, thus giving the most vulnerable sectors. The result involves a comprehensive analysis of different classification algorithms and is indicative of the relationship between the disease outbreak occurrence and the magnitudes of various development indicators.

【3】 Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of SNOMED codes
标题：医疗条件的共现：通过SNOMED代码的概率主题建模暴露模式
链接：https://arxiv.org/abs/2109.09199

作者：Moumita Bhattacharya,Claudine Jurkovitz,Hagit Shatkay
机构：a Computational Biomedicine Lab, Computer and Information Sciences, University of Delaware, Newark, DE, USA, b Value Institute, Christiana Care Health System, Newark, DE, USA
备注：None
摘要：与多种同时发生的健康状况相关的患者往往面临严重的并发症和较差的预后。同时发生的疾病在患有肾病的个体中尤其普遍，这种疾病越来越普遍，影响到美国13%的普通人群。这项研究的目的是利用概率框架来识别和描述患者中同时发生的医疗状况的模式。具体而言，我们以非传统的方式应用主题建模，以发现13000名以上肾脏疾病患者EHR中指定和记录的SNOMEDCT代码之间的关联。与大多数以前的主题建模工作不同，我们将该方法应用于代码，而不是自然语言。此外，我们还对这些主题进行定量评估，评估其紧密性和独特性，并评估我们结果的医学有效性。我们的实验表明，每个主题都有一些高度可能和唯一的疾病代码，这表明主题是紧密的。此外，每对主题之间的主题间距离通常很高，说明了显著性。最后，在医学文献中，大多数分组在一个主题中的编码条件确实被报道为同时发生。值得注意的是，我们的结果揭示了一些迄今为止尚未在医学文献中报道为相关的条件之间的间接关联。
摘要：Patients associated with multiple co-occurring health conditions often face aggravated complications and less favorable outcomes. Co-occurring conditions are especially prevalent among individuals suffering from kidney disease, an increasingly widespread condition affecting 13% of the general population in the US. This study aims to identify and characterize patterns of co-occurring medical conditions in patients employing a probabilistic framework. Specifically, we apply topic modeling in a non-traditional way to find associations across SNOMEDCT codes assigned and recorded in the EHRs of>13,000 patients diagnosed with kidney disease. Unlike most prior work on topic modeling, we apply the method to codes rather than to natural language. Moreover, we quantitatively evaluate the topics, assessing their tightness and distinctiveness, and also assess the medical validity of our results. Our experiments show that each topic is succinctly characterized by a few highly probable and unique disease codes, indicating that the topics are tight. Furthermore, inter-topic distance between each pair of topics is typically high, illustrating distinctiveness. Last, most coded conditions grouped together within a topic, are indeed reported to co-occur in the medical literature. Notably, our results uncover a few indirect associations among conditions that have hitherto not been reported as correlated in the medical literature.

【4】 Change of human mobility during COVID-19: A United States case study
标题：冠状病毒期间人类流动性的变化：美国案例研究
链接：https://arxiv.org/abs/2109.09022

作者：Justin Elarde,Joon-Seok Kim,Hamdi Kavak,Andreas Züfle,Taylor Anderson
机构： Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USA, Department of Computational and Data Sciences, George Mason University, Fairfax
备注：Current under review at PLOS One
摘要：随着新冠病毒-19的出现以及由此产生的住房到位指南与远程工作实践相结合，2020年的人类流动性受到了巨大影响。现有研究通常检查特定地点的流动性在特定时间点是增加还是减少，并将这些变化与某些流行病和政策事件联系起来。在本文中，我们使用流动足迹数据，通过五步过程研究美国的流动变化。（步骤1）提出在公共场所花费的delta时间（delta TSPP），作为量化2019-2020年美国每个县的每日流动性变化的措施。（步骤2）进行主成分分析（PCA），将每个县的Delta TSPP时间序列减少到流动性变化的低维潜在成分。（步骤3）进行聚类分析，以发现具有相似潜在成分的县。（步骤4）调查每个组件的局部和全局空间自相关。（步骤5）进行相关性分析，以调查各种人口特征和行为与流动模式的相关性。结果表明，通过将每个县描述为三个潜在成分的线性组合，我们可以解释美国所有县59%的流动趋势变化。具体而言，美国各州2020年的流动性变化可以解释为三个潜在因素的组合：1）长期流动性下降，2）流动性不变，以及3）短期流动性下降。我们观察到流动性变化的三个潜在组成部分与各种人口特征之间存在显著相关性，包括政治倾向、人口、新冠病毒-19病例和死亡以及失业。我们发现，我们的分析提供了对应对新冠病毒-19大流行的流动性变化的全面理解。
摘要：With the onset of COVID-19 and the resulting shelter in place guidelines combined with remote working practices, human mobility in 2020 has been dramatically impacted. Existing studies typically examine whether mobility in specific localities increases or decreases at specific points in time and relate these changes to certain pandemic and policy events. In this paper, we study mobility change in the US through a five-step process using mobility footprint data. (Step 1) Propose the delta Time Spent in Public Places (Delta-TSPP) as a measure to quantify daily changes in mobility for each US county from 2019-2020. (Step 2) Conduct Principal Component Analysis (PCA) to reduce the Delta-TSPP time series of each county to lower-dimensional latent components of change in mobility. (Step 3) Conduct clustering analysis to find counties that exhibit similar latent components. (Step 4) Investigate local and global spatial autocorrelation for each component. (Step 5) Conduct correlation analysis to investigate how various population characteristics and behavior correlate with mobility patterns. Results show that by describing each county as a linear combination of the three latent components, we can explain 59% of the variation in mobility trends across all US counties. Specifically, change in mobility in 2020 for US counties can be explained as a combination of three latent components: 1) long-term reduction in mobility, 2) no change in mobility, and 3) short-term reduction in mobility. We observe significant correlations between the three latent components of mobility change and various population characteristics, including political leaning, population, COVID-19 cases and deaths, and unemployment. We find that our analysis provides a comprehensive understanding of mobility change in response to the COVID-19 pandemic.

【5】 Development of patients triage algorithm from nationwide COVID-19 registry data based on machine learning
标题：基于机器学习的全国冠状病毒登记数据患者分流算法研究
链接：https://arxiv.org/abs/2109.09001

作者：Hyung Ju Hwang,Seyoung Jung,Min Sue Park,Hyeontae Jo
机构：Department of Mathematics, Pohang University of Science and Technology, Pohang ,-, Republic of Korea
备注：8 pages, 8 figures, 1 table
摘要：传染病确诊患者的即时严重性评估模型可以实现有效诊断，减轻医疗系统的负担。本文介绍了利用机器学习技术建立严重程度评估模型的过程及其在SARS-CoV-2患者中的应用。在此，我们强调，我们的模型只需要基本患者的基本个人数据，允许他们判断自己的严重程度。我们选择基于boosting的决策树模型作为分类器，并在建模后将死亡率解释为概率分数。具体来说，确定树模型结构的超参数使用贝叶斯优化技术进行调整，而不需要任何医疗信息知识。因此，我们测量了模型性能，并通过模型确定了影响严重性的变量。最后，我们的目标是建立一个医疗系统，允许患者检查自己的严重程度，并根据其他类似严重程度患者的过去治疗细节通知他们访问相应的诊所中心。
摘要：Prompt severity assessment model of confirmed patients who were infected with infectious diseases could enable efficient diagnosis and alleviate the burden on the medical system. This paper provides the development processes of the severity assessment model using machine learning techniques and its application on SARS-CoV-2 patients. Here, we highlight that our model only requires basic patients' basic personal data, allowing for them to judge their own severity. We selected the boosting-based decision tree model as a classifier and interpreted mortality as a probability score after modeling. Specifically, hyperparameters that determine the structure of the tree model were tuned using the Bayesian optimization technique without any knowledge of medical information. As a result, we measured model performance and identified the variables affecting the severity through the model. Finally, we aim to establish a medical system that allows patients to check their own severity and informs them to visit the appropriate clinic center based on the past treatment details of other patients with similar severity.

【6】 Atrial Fibrillation: A Medical and Technological Review
标题：心房颤动的医学和技术回顾
链接：https://arxiv.org/abs/2109.08974

作者：Samayan Bhattacharya,Sk Shahnawaz
机构：Department of Computer Science and Engineering, Raja Subodh Chandra Mallick Rd, Jadavpur University Campus Area, Jadavpur, Kolkata, West Bengal , India
备注：11 pages, 1 figure
摘要：心房颤动（AF）是导致美国住院的最常见的心律失常类型（希腊语a-，丧失+心律，心律=丧失心律）。虽然有时房颤是无症状的，但除了降低健康相关生活质量（HRQOL）外，它还会增加患者中风和心力衰竭的风险。与房颤相关的医疗费用每年在60亿至260亿美元之间。早期发现房颤和临床关注有助于改善患者的症状和HRQOL，并降低护理成本。然而，房颤检测的普遍模式依赖于在单个时间点记录的心电图（ECG），并且没有阐明症状与心律或房颤的关系。在最近十年中，由于健康监视器的民主化和高性能计算机的出现，机器学习算法已被证明能有效地从患者的心电图中识别房颤。本文概述了房颤的症状、诊断以及该领域未来的研究前景。
摘要：Atrial Fibrillation (AF) is the most common type of arrhythmia (Greek a-, loss + rhythmos, rhythm = loss of rhythm) leading to hospitalization in the United States. Though sometimes AF is asymptomatic, it increases the risk of stroke and heart failure in patients, in addition to lowering the health-related quality of life (HRQOL). AF-related care costs the healthcare system between $6.0 to $26 billion each year. Early detection of AF and clinical attention can help improve symptoms and HRQOL of the patient, as well as bring down the cost of care. However, the prevalent paradigm of AF detection depends on electrocardiogram (ECG) recorded at a single point in time and does not shed light on the relation of the symptoms with heart rhythm or AF. In the recent decade, due to the democratization of health monitors and the advent of high-performing computers, Machine Learning algorithms have been proven effective in identifying AF, from the ECG of patients. This paper provides an overview of the symptoms of AF, its diagnosis, and future prospects for research in the field.

【7】 Predicting Visual Improvement after Macular Hole Surgery: a Cautionary Tale on Deep Learning with Very Limited Data
标题：预测黄斑裂孔手术后视力改善：数据非常有限的深度学习警示故事
链接：https://arxiv.org/abs/2109.09463

作者：M. Godbout,A. Lachance,F. Antaki,A. Dirani,A. Durand
机构：arXiv:,.,v, [eess.IV] , Sep
摘要：我们研究了机器学习模型从术前数据（视网膜图像和临床特征）预测黄斑裂孔手术后视力改善的潜力。为了完成这项任务，我们收集了自己的数据，结果总共只有121个样本，这使我们的工作处于非常有限的数据体系中。我们探索了各种针对有限数据的深度学习方法来训练深度计算机视觉模型，发现所有测试的深度视觉模型在临床特征上都优于简单回归模型。我们相信这是令人信服的证据，证明在非常有限的数据上使用深度学习非常困难。
摘要：We investigate the potential of machine learning models for the prediction of visual improvement after macular hole surgery from preoperative data (retinal images and clinical features). Collecting our own data for the task, we end up with only 121 total samples, putting our work in the very limited data regime. We explore a variety of deep learning methods for limited data to train deep computer vision models, finding that all tested deep vision models are outperformed by a simple regression model on the clinical features. We believe this is compelling evidence of the extreme difficulty of using deep learning on very limited data.

【8】 Robust Automated Framework for COVID-19 Disease Identification from a Multicenter Dataset of Chest CT Scans
标题：从胸部CT多中心数据集中识别冠状病毒病的鲁棒自动化框架
链接：https://arxiv.org/abs/2109.09241

作者：Shahin Heidarian,Parnian Afshar,Nastaran Enshaei,Farnoosh Naderkhani,Moezedin Javad Rafiee,Anastasia Oikonomou,Akbar Shafiee,Pascal N. Tyrrell,Faranak Babaki Fard,Konstantinos N. plataniotis,Arash Mohammadi
机构：Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada, Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada
摘要：本研究的目的是开发一个强大的基于深度学习的框架，以区分新冠病毒-19、社区获得性肺炎（CAP）和正常病例，该框架基于在不同成像中心使用不同方案获得的胸部CT扫描和辐射剂量。我们表明，虽然我们提出的模型是在使用特定扫描协议仅从一个成像中心获得的相对较小的数据集上训练的，但该模型在使用不同技术参数的多个扫描仪获得的异构测试集上表现良好。我们还表明，该模型可以通过无监督的方法进行更新，以处理列车和测试集之间的数据转移，并在接收到来自不同中心的新外部数据集时增强模型的鲁棒性。我们采用了集成架构来聚合来自模型多个版本的预测。出于初始训练和开发目的，使用了171例新冠病毒-19、60例CAP和76例正常病例的内部数据集，其中包含使用恒定标准辐射剂量扫描协议从一个成像中心获得的体积CT扫描。为了评估模型，我们收集了四个不同的测试集，回顾性地调查数据特征的变化对模型性能的影响。在测试案例中，有与列车组特征相似的CT扫描，以及噪声低剂量和超低剂量CT扫描。此外，一些测试CT扫描来自有心血管疾病或手术史的患者。本研究中使用的整个测试数据集包括51例COVID-19、28例CAP和51例正常病例。实验结果表明，我们提出的框架在所有测试集上都表现良好，实现了96.15%（95%CI:[91.25-98.74]）、96.08%（95%CI:[86.54-99.5]）、92.86%（95%CI:[76.50-99.19]）的总准确度。
摘要：The objective of this study is to develop a robust deep learning-based framework to distinguish COVID-19, Community-Acquired Pneumonia (CAP), and Normal cases based on chest CT scans acquired in different imaging centers using various protocols, and radiation doses. We showed that while our proposed model is trained on a relatively small dataset acquired from only one imaging center using a specific scanning protocol, the model performs well on heterogeneous test sets obtained by multiple scanners using different technical parameters. We also showed that the model can be updated via an unsupervised approach to cope with the data shift between the train and test sets and enhance the robustness of the model upon receiving a new external dataset from a different center. We adopted an ensemble architecture to aggregate the predictions from multiple versions of the model. For initial training and development purposes, an in-house dataset of 171 COVID-19, 60 CAP, and 76 Normal cases was used, which contained volumetric CT scans acquired from one imaging center using a constant standard radiation dose scanning protocol. To evaluate the model, we collected four different test sets retrospectively to investigate the effects of the shifts in the data characteristics on the model's performance. Among the test cases, there were CT scans with similar characteristics as the train set as well as noisy low-dose and ultra-low dose CT scans. In addition, some test CT scans were obtained from patients with a history of cardiovascular diseases or surgeries. The entire test dataset used in this study contained 51 COVID-19, 28 CAP, and 51 Normal cases. Experimental results indicate that our proposed framework performs well on all test sets achieving total accuracy of 96.15% (95%CI: [91.25-98.74]), COVID-19 sensitivity of 96.08% (95%CI: [86.54-99.5]), CAP sensitivity of 92.86% (95%CI: [76.50-99.19]).

【9】 Optimal Ensemble Construction for Multi-Study Prediction with Applications to COVID-19 Excess Mortality Estimation
标题：多研究预测的最优集成构造及其在冠状病毒超额死亡率估计中的应用
链接：https://arxiv.org/abs/2109.09164

作者：Gabriel Loewinger,Rolando Acosta Nunez,Rahul Mazumder,Giovanni Parmigiani
机构：†Department of Biostatistics, Harvard School of Public Health, Boston, MA, ‡MIT Sloan School of Management, Operations Research Center and MIT Center for Statistics, Cambridge, MA, ∗Department of Data Science, Dana Farber Cancer Institute, Boston, MA
备注：Manuscript: 26 pages, 6 figures, 4 tables; Supplement: 18 pages, 11 figures, 10 tables
摘要：在生物医学科学中，遇到预测任务越来越常见，因为有多个数据集可用于模型训练。当数据集是异构的时，共用数据集和应用标准统计学习方法等常用方法可能会导致较差的研究外预测性能。理论和应用研究表明$\textit{multi-study ensembling}$是一种可行的替代方案，它以促进模型通用性的方式利用了数据集的可变性。多研究集成使用两阶段$\textit{stacking}$策略，该策略适合研究特定模型，并分别估计集成权重。然而，这种方法忽略了模型拟合阶段的集合特性，可能导致效率损失。因此，我们提出了$\textit{optimal Englose construction}$，这是一种多研究叠加的$\textit{all-in-one}$方法，通过这种方法，我们可以联合估计集合权重以及与每个研究特定模型相关的参数。我们证明了我们方法的局限性产生了现有的方法，如多研究叠加和在模型拟合前汇集数据集。我们提出了一种有效的块坐标下降算法来优化所提出的损失函数。我们将我们的方法与标准方法进行比较，将其应用于多国新冠病毒-19数据集进行基线死亡率预测。我们表明，当一个国家在大流行爆发前几乎没有可用数据时，利用其他国家的数据可以大大提高预测准确性。重要的是，在这个应用中，我们的方法优于多研究叠加和其他标准方法。我们进一步描述了该方法在数据驱动和其他模拟中的性能。在一系列研究间异质性水平上，我们的方法与多研究叠加法和其他早期方法相比仍具有竞争力或优于多研究叠加法。
摘要：It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets and applying standard statistical learning methods can result in poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied work has shown $\textit{multi-study ensembling}$ to be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multi-study ensembling uses a two-stage $\textit{stacking}$ strategy which fits study-specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model-fitting stage, potentially resulting in a loss of efficiency. We therefore propose $\textit{optimal ensemble construction}$, an $\textit{all-in-one}$ approach to multi-study stacking whereby we jointly estimate ensemble weights as well as parameters associated with each study-specific model. We prove that limiting cases of our approach yield existing methods such as multi-study stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the proposed loss function. We compare our approach to standard methods by applying it to a multi-country COVID-19 dataset for baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. Importantly, our approach outperforms multi-study stacking and other standard methods in this application. We further characterize the method's performance in data-driven and other simulations. Our method remains competitive with or outperforms multi-study stacking and other earlier methods across a range of between-study heterogeneity levels.

【10】 A survey on deep learning approaches for breast cancer diagnosis
标题：深度学习方法在乳腺癌诊断中的研究进展
链接：https://arxiv.org/abs/2109.08853

作者：Timothy Kwong,Samaneh Mazaheri
机构：Ontario Tech University, Oshawa, ON Canada
摘要：深度学习引入了几种基于学习的方法来识别乳腺肿瘤，在乳腺癌诊断中具有很高的适用性。它是计算机辅助诊断（CAD）系统中的一个实用装置，可进一步帮助放射科医生进行不同模式的诊断。根据医院或公共数据库提供的图像训练的深度学习网络可以执行病变类型的分类、检测和分割。在2D图像上识别肿瘤已经取得了重大进展，但到目前为止，识别3D图像仍然是一个前沿领域。不同研究领域之间深度学习网络的互联有助于推动发现更高效、准确和强健的网络。在这篇综述文章中，将探讨以下主题：（i）深度学习的理论和应用，（ii）从性能指标的角度来看，2D、2.5D和3D CNN方法在乳腺肿瘤识别中的进展，以及（iii）CNN方法面临的挑战。
摘要：Deep learning has introduced several learning-based methods to recognize breast tumours and presents high applicability in breast cancer diagnostics. It has presented itself as a practical installment in Computer-Aided Diagnostic (CAD) systems to further assist radiologists in diagnostics for different modalities. A deep learning network trained on images provided by hospitals or public databases can perform classification, detection, and segmentation of lesion types. Significant progress has been made in recognizing tumours on 2D images but recognizing 3D images remains a frontier so far. The interconnection of deep learning networks between different fields of study help propels discoveries for more efficient, accurate, and robust networks. In this review paper, the following topics will be explored: (i) theory and application of deep learning, (ii) progress of 2D, 2.5D, and 3D CNN approaches in breast tumour recognition from a performance metric perspective, and (iii) challenges faced in CNN approaches.

【11】 Segmentation of Brain MRI using an Altruistic Harris Hawks' Optimization algorithm
标题：利他主义Harris Hawks优化算法在脑MRI分割中的应用
链接：https://arxiv.org/abs/2109.08688

作者：Rajarshi Bandyopadhyay,Rohit Kundu,Diego Oliva,Ram Sarkar
机构：Department of Electrical Engineering, Jadavpur University, Raja S.C. Mullick Road, Kolkata-, West Bengal, INDIA, Depto. Universidad de Guadalajara, CUCEI, Av. Revolucion , Guadalajara, Jal, MEXICO, School of Computer Science & Robotics
备注：None
摘要：当数字图像用于疾病诊断时，尤其是在分析和疾病识别等后续任务中，分割是医学中的一项基本要求。脑磁共振图像（MRI）的有效分割是放射科医生最关心的问题，因为其光照差以及与图像去采集相关的其他条件。阈值分割是一种流行的分割方法，它使用图像的直方图将不同的同质像素组标记为不同的类别。然而，计算成本随着阈值的数量呈指数增长。在本文中，我们使用进化元启发式进行多级阈值分割。它是Harris Hawks优化（HHO）算法的改进版本，结合了混沌初始化和利他主义的概念。此外，对于适应度分配，我们使用了一个混合目标函数，在交叉熵最小化的同时，我们应用了一个新的熵函数，并利用两个目标函数的权重来形成一个新的混合方法。HHO最初设计用于解决数值优化问题。早些时候，统计结果和比较表明，与成熟的元启发式技术相比，HHO提供了非常有希望的结果。在本文中，利他主义已被纳入HHO算法，以增强其利用能力。我们使用一些标准的评估指标，对来自哈佛医学院WBA数据库的10幅基准图像和来自Brainweb数据集的8幅基准图像进行了评估。
摘要：Segmentation is an essential requirement in medicine when digital images are used in illness diagnosis, especially, in posterior tasks as analysis and disease identification. An efficient segmentation of brain Magnetic Resonance Images (MRIs) is of prime concern to radiologists due to their poor illumination and other conditions related to de acquisition of the images. Thresholding is a popular method for segmentation that uses the histogram of an image to label different homogeneous groups of pixels into different classes. However, the computational cost increases exponentially according to the number of thresholds. In this paper, we perform the multi-level thresholding using an evolutionary metaheuristic. It is an improved version of the Harris Hawks Optimization (HHO) algorithm that combines the chaotic initialization and the concept of altruism. Further, for fitness assignment, we use a hybrid objective function where along with the cross-entropy minimization, we apply a new entropy function, and leverage weights to the two objective functions to form a new hybrid approach. The HHO was originally designed to solve numerical optimization problems. Earlier, the statistical results and comparisons have demonstrated that the HHO provides very promising results compared with well-established metaheuristic techniques. In this article, the altruism has been incorporated into the HHO algorithm to enhance its exploitation capabilities. We evaluate the proposed method over 10 benchmark images from the WBA database of the Harvard Medical School and 8 benchmark images from the Brainweb dataset using some standard evaluation metrics.

蒸馏|知识提取(1篇)

【1】 Probabilistic Bearing Fault Diagnosis Using Gaussian Process with Tailored Feature Extraction
标题：基于定制特征提取的高斯过程概率轴承故障诊断
链接：https://arxiv.org/abs/2109.09189

作者：Mingxuan Liang,Kai Zhou
机构：College of Mechanical and Electrical Engineering, China Jiliang University, Research Assistant Professor, Department of Mechanical Engineering-Engineering Mechanics, Michigan Technological University, † Corresponding author
摘要：滚动轴承在恶劣环境下长期运行，容易发生各种故障，导致机械系统的意外故障，造成严重事故。近年来，深度学习方法在数据驱动的轴承故障诊断中得到了越来越广泛的应用。然而，目前的深度学习方法以确定性分类的形式进行轴承故障诊断，忽略了实际应用中不可避免的不确定性。为了解决这一问题，本研究开发了一个概率故障诊断框架，能够考虑预测中的不确定性影响，具有实际意义。该框架充分利用了高斯过程分类器（GPC）的概率特性。为了便于建立高保真GPC，可通过基于交叉验证的网格搜索，在由各种核主成分分析（KPCA）方法和堆叠自动编码器组成的预先指定的方法池上，优化确定具有降维方法的定制特征提取。该策略可以保证特征和故障之间复杂的非线性关系得到充分的表征。此外，还采用了传感器融合的概念来提高诊断性能。与传统的深度学习方法相比，该框架通常需要更少的标记数据和更少的参数调整工作。使用可公开访问的实验滚动轴承数据集进行了系统的案例研究，以验证这一新框架。对影响故障诊断性能的各种因素也进行了深入研究。
摘要：Rolling bearings are subject to various faults due to its long-time operation under harsh environment, which will lead to unexpected breakdown of machinery system and cause severe accidents. Deep learning methods recently have gained growing interests and extensively applied in the data-driven bearing fault diagnosis. However, current deep learning methods perform the bearing fault diagnosis in the form of deterministic classification, which overlook the uncertainties that inevitably exist in actual practice. To tackle this issue, in this research we develop a probabilistic fault diagnosis framework that can account for the uncertainty effect in prediction, which bears practical significance. This framework fully leverages the probabilistic feature of Gaussian process classifier (GPC). To facilitate the establishment of high-fidelity GPC, the tailored feature extraction with dimensionality reduction method can be optimally determined through the cross validation-based grid search upon a prespecified method pool consisting of various kernel principal component analysis (KPCA) methods and stacked autoencoder. This strategy can ensure the complex nonlinear relations between the features and faults to be adequately characterized. Furthermore, the sensor fusion concept is adopted to enhance the diagnosis performance. As compared with the traditional deep learning methods, this proposed framework usually requires less labeled data and less effort for parameter tuning. Systematic case studies using the publicly accessible experimental rolling bearing dataset are carried out to validate this new framework. Various influencing factors on fault diagnosis performance also are thoroughly investigated.

推荐(1篇)

【1】 Inductive Conformal Recommender System
标题：归纳共形推荐系统
链接：https://arxiv.org/abs/2109.08949

作者：Venkateswara Rao Kagita,Arun K Pujari,Vineet Padmanabhan,Vikas Kumar
机构：National Institute of Technology, Warangal, India, Mahindra University, Hyderbad, India, University of Hyderabad, Hyderbad, India, University of Delhi, Delhi, India, Central University of Rajasthan, Rajasthan, India
备注：19 pages
摘要：传统的推荐算法发展了一些技术，可以帮助人们选择想要的项目。然而，在许多实际应用中，除了一组建议外，还必须量化每个建议的（非）确定性。保形推荐系统利用用户的经验输出一组推荐，每个推荐都与精确的置信值相关联。给定一个显著性级别$\varepsilon$，它提供了一个关于错误推荐概率的$\varepsilon$边界。共形框架使用了一个称为“不一致性度量”的关键概念，该概念度量一个项目与其他项目相关的陌生度。任何一致性推荐框架的重要设计挑战之一是将不一致性度量与推荐算法相结合。本文介绍了共形推荐系统的一种归纳变体。我们提出并分析归纳环境中的不同不合格措施。我们还提供了误差界和时间复杂度的理论证明。对十个基准数据集的大量实证分析表明，归纳变量在保持准确性的同时，显著提高了计算时间性能。
摘要：Traditional recommendation algorithms develop techniques that can help people to choose desirable items. However, in many real-world applications, along with a set of recommendations, it is also essential to quantify each recommendation's (un)certainty. The conformal recommender system uses the experience of a user to output a set of recommendations, each associated with a precise confidence value. Given a significance level $\varepsilon$, it provides a bound $\varepsilon$ on the probability of making a wrong recommendation. The conformal framework uses a key concept called nonconformity measure that measure the strangeness of an item concerning other items. One of the significant design challenges of any conformal recommendation framework is integrating nonconformity measure with the recommendation algorithm. In this paper, we introduce an inductive variant of a conformal recommender system. We propose and analyze different nonconformity measures in the inductive setting. We also provide theoretical proofs on the error-bound and the time complexity. Extensive empirical analysis on ten benchmark datasets demonstrates that the inductive variant substantially improves the performance in computation time while preserving the accuracy.

聚类(1篇)

【1】 Local versions of sum-of-norms clustering
标题：范数和聚类的局部版本
链接：https://arxiv.org/abs/2109.09589

作者：Alexander Dunlap,Jean-Christophe Mourrat
备注：16 pages
摘要：范数和聚类是一个凸优化问题，其解可用于多变量数据的聚类。我们提出并研究了该方法的一个局部化版本，并特别证明了它可以在随机球模型中分离任意闭合球。更准确地说，我们证明了不相交连通集聚类误差的一个定量界。我们的界限用数据点的数量和函数的本地化长度表示。
摘要：Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.

超分辨率|去噪|去模糊|去雾(2篇)

【1】 CNN-based Temporal Super Resolution of Radar Rainfall Products
标题：基于CNN的雷达降雨产品时间超分辨率研究
链接：https://arxiv.org/abs/2109.09289

作者：Muhammed Sit,Bong-Chul Seo,Ibrahim Demir
机构：University of Iowa, Iowa City, IA
摘要：降雨数据的时间和空间分辨率对于气候变化建模研究至关重要，在气候变化建模研究中，降雨数据的空间和时间变异性被视为主要因素。不同遥感仪器（如雷达或卫星）的降雨产品由于其传感能力的不同而提供不同的时空分辨率。我们开发了一种方法，通过增加时间分辨率来增加降雨数据，以补充相对较低分辨率的产品。本研究提出了一种基于卷积神经网络（CNNs）的神经网络结构，以提高雷达降雨产品的时间分辨率，并将该模型与基于光流的插值方法进行了比较。
摘要：The temporal and spatial resolution of rainfall data is crucial for climate change modeling studies in which its variability in space and time is considered as a primary factor. Rainfall products from different remote sensing instruments (e.g., radar or satellite) provide different space-time resolutions because of the differences in their sensing capabilities. We developed an approach that augments rainfall data with increased time resolutions to complement relatively lower resolution products. This study proposes a neural network architecture based on Convolutional Neural Networks (CNNs) to improve temporal resolution of radar-based rainfall products and compares the proposed model with an optical flow-based interpolation method.

【2】 Removing Noise from Extracellular Neural Recordings Using Fully Convolutional Denoising Autoencoders
标题：用全卷积去噪自动编码器去除细胞外神经记录中的噪声
链接：https://arxiv.org/abs/2109.08945

作者：Christodoulos Kechris,Alexandros Delitzas,Vasileios Matsoukas,Panagiotis C. Petrantonakis
机构： Two of the most prominent waveletdenoising methods for neural signals are the Discrete WaveletThese authors contributed equally 1Department of Electrical and Computer Engineering, Greece 2Information Technologies Institute
备注：Accepted version to be published in the 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2021)
摘要：细胞外记录受到大量噪声源的严重污染，使得去噪过程成为一项极具挑战性的任务，需要有效的尖峰分类。为此，我们提出了一种端到端的深度学习方法，利用一个完全卷积去噪的自动编码器，该编码器学习从有噪声的多通道输入中产生干净的神经元活动信号。对模拟数据的实验结果表明，该方法能显著改善噪声污染神经信号的质量，优于广泛使用的小波去噪技术。
摘要：Extracellular recordings are severely contaminated by a considerable amount of noise sources, rendering the denoising process an extremely challenging task that should be tackled for efficient spike sorting. To this end, we propose an end-to-end deep learning approach to the problem, utilizing a Fully Convolutional Denoising Autoencoder, which learns to produce a clean neuronal activity signal from a noisy multichannel input. The experimental results on simulated data show that our proposed method can improve significantly the quality of noise-corrupted neural signals, outperforming widely-used wavelet denoising techniques.

自动驾驶|车辆|车道检测等(5篇)

【1】 Predicting vehicles parking behaviour in shared premises for aggregated EV electricity demand response programs
标题：为综合电动汽车电力需求响应计划预测共享场所中的车辆停车行为
链接：https://arxiv.org/abs/2109.09666

作者：Vinicius Monteiro de Lira,Fabiano Pallonetto,Lorenzo Gabrielli,Chiara Renso
机构： Maynooth University, ie‡Institute of Information Science and Technologies
摘要：2020年，全球电动汽车销量继续超出预期，达到300多万辆，市场份额超过4%。然而，可再生能源普及率的提高以及电动汽车（EV）的出现及其额外的电力需求所导致的发电不确定性可能会在配电和输电层面上对电力系统造成压力。需求响应聚合和负载控制将使电网更稳定，可再生能源更深入电网。目前的工作符合这一背景，支持停车场内电动汽车的充电优化，假设电动汽车在系统中的普及率很高。我们提出了一种预测共享停车场停车时间的方法，目的是估计特定停车场的能源需求，评估最佳电动汽车充电计划，并将该计划集成到智能控制器中。我们将预测问题形式化为一个有监督的机器学习任务，以在车辆离开停车位之前预测停车事件的持续时间。该预测持续时间为能源管理系统供电，该系统将在整个持续时间内分配电力，从而降低总峰值电力需求。受两个研究问题的启发，我们构建了我们的实验，旨在发现所提出的机器学习方法的准确性和预测模型的最相关特征。我们对来自意大利和巴西两个不同校园设施的4个数据集进行了不同算法和功能组合的实验。与基于频率的统计分析相比，使用上下文和时间特征，模型的总体结果显示出更高的准确性，这表明开发共享停车场能源管理系统准确预测器的可行途径
摘要：The global electric car sales in 2020 continued to exceed the expectations climbing to over 3 millions and reaching a market share of over 4%. However, uncertainty of generation caused by higher penetration of renewable energies and the advent of Electrical Vehicles (EV) with their additional electricity demand could cause strains to the power system, both at distribution and transmission levels. Demand response aggregation and load control will enable greater grid stability and greater penetration of renewable energies into the grid. The present work fits this context in supporting charging optimization for EV in parking premises assuming a incumbent high penetration of EVs in the system. We propose a methodology to predict an estimation of the parking duration in shared parking premises with the objective of estimating the energy requirement of a specific parking lot, evaluate optimal EVs charging schedule and integrate the scheduling into a smart controller. We formalize the prediction problem as a supervised machine learning task to predict the duration of the parking event before the car leaves the slot. This predicted duration feeds the energy management system that will allocate the power over the duration reducing the overall peak electricity demand. We structure our experiments inspired by two research questions aiming to discover the accuracy of the proposed machine learning approach and the most relevant features for the prediction models. We experiment different algorithms and features combination for 4 datasets from 2 different campus facilities in Italy and Brazil. Using both contextual and time of the day features, the overall results of the models shows an higher accuracy compared to a statistical analysis based on frequency, indicating a viable route for the development of accurate predictors for sharing parking premises energy management systems

【2】 Description of Corner Cases in Automated Driving: Goals and Challenges
标题：自动驾驶转角案例描述：目标与挑战
链接：https://arxiv.org/abs/2109.09607

作者：Daniel Bogdoll,Jasmin Breitenstein,Florian Heidecker,Maarten Bieshaar,Bernhard Sick,Tim Fingscheidt,J. Marius Zöllner
机构：J. Marius Z¨ollner
摘要：扩展自动化车辆的分布需要处理各种意外和可能的危险情况，称为拐角案例（CC）。由于自动驾驶系统的许多模块都基于机器学习（ML），CC是其开发数据的重要组成部分。然而，大规模数据收集中的CC数据数量有限，这使得它们在ML环境中具有挑战性。通过更好地理解CC，可以改进离线应用程序（如数据集分析）和在线方法（如提高自动驾驶系统的性能）。虽然CC有基于知识的描述和分类法，但很少有关于机器可解释描述的研究。在这个扩展的摘要中，我们将简要概述这种描述的挑战和目标。
摘要：Scaling the distribution of automated vehicles requires handling various unexpected and possibly dangerous situations, termed corner cases (CC). Since many modules of automated driving systems are based on machine learning (ML), CC are an essential part of the data for their development. However, there is only a limited amount of CC data in large-scale data collections, which makes them challenging in the context of ML. With a better understanding of CC, offline applications, e.g., dataset analysis, and online methods, e.g., improved performance of automated driving systems, can be improved. While there are knowledge-based descriptions and taxonomies for CC, there is little research on machine-interpretable descriptions. In this extended abstract, we will give a brief overview of the challenges and goals of such a description.

【3】 Traffic-Net: 3D Traffic Monitoring Using a Single Camera
标题：交通网：单摄像机三维交通监控
链接：https://arxiv.org/abs/2109.09165

作者：Mahdi Rezaei,Mohsen Azarmi,Farzam Mohammad Pour Mir
机构：★, Institute for Transport Studies, The University of Leeds, Leeds, LS,JT, UK, Department of Computer Engineering, Qazvin University, Qazvin, IR, Tehran Azad University, Science & Research Branch, IR
摘要：计算机视觉在智能交通系统（ITS）和交通监控中发挥了重要作用。随着自动化车辆和拥挤城市的快速增长，采用视频监控基础设施的自动化和先进交通管理系统（ATM）已经通过深度神经网络的实现而发展。在本研究中，我们提供了一个实用的实时交通监控平台，包括3D车辆/行人检测、速度检测、轨迹估计、拥堵检测，以及监控车辆和行人的交互，所有这些都使用单个CCTV交通摄像头。我们采用定制的YOLOv5深度神经网络模型进行车辆/行人检测，并采用增强的排序跟踪算法。首次开发了一种用于摄像机自动校准的卫星-地面混合逆透视映射（SG-IPM）方法，该方法可实现精确的三维目标检测和可视化。我们还开发了基于短期和长期时间视频数据流的分层交通建模解决方案，以了解弱势道路用户的交通流、瓶颈和危险点。使用各种交通监控数据集，包括从不同照明和天气条件下的公路、交叉口和城市区域收集的MIO-TCD、UA-DETRAC和GRAM-RTM，对真实场景进行了若干实验，并与最新技术进行了比较。
摘要：Computer Vision has played a major role in Intelligent Transportation Systems (ITS) and traffic surveillance. Along with the rapidly growing automated vehicles and crowded cities, the automated and advanced traffic management systems (ATMS) using video surveillance infrastructures have been evolved by the implementation of Deep Neural Networks. In this research, we provide a practical platform for real-time traffic monitoring, including 3D vehicle/pedestrian detection, speed detection, trajectory estimation, congestion detection, as well as monitoring the interaction of vehicles and pedestrians, all using a single CCTV traffic camera. We adapt a custom YOLOv5 deep neural network model for vehicle/pedestrian detection and an enhanced SORT tracking algorithm. For the first time, a hybrid satellite-ground based inverse perspective mapping (SG-IPM) method for camera auto-calibration is also developed which leads to an accurate 3D object detection and visualisation. We also develop a hierarchical traffic modelling solution based on short- and long-term temporal video data stream to understand the traffic flow, bottlenecks, and risky spots for vulnerable road users. Several experiments on real-world scenarios and comparisons with state-of-the-art are conducted using various traffic monitoring datasets, including MIO-TCD, UA-DETRAC and GRAM-RTM collected from highways, intersections, and urban areas under different lighting and weather conditions.

【4】 Dynamic and Systematic Survey of Deep Learning Approaches for Driving Behavior Analysis
标题：驾驶行为分析深度学习方法的动态系统研究
链接：https://arxiv.org/abs/2109.08996

作者：Farid Talebloo,Emad A. Mohammed,Behrouz H. Far
机构：Department of Electrical and Software Engineering, University of Calgary, University Drive NW, Calgary, Alberta T,N ,N, CANADA
摘要：不当驾驶会导致死亡、损坏、能源消耗增加和车辆贬值。分析驾驶行为可以优化和避免上述问题。通过识别驾驶类型并将其映射到该类型驾驶的后果，我们可以得到一个模型来防止它们。在这方面，我们试图创建一份动态调查报告，以审查和展示驾驶行为调查数据，供未来研究人员使用。通过分析58篇文章，我们试图对标准方法进行分类，并为将来的文章提供一个框架，以便在不同的仪表盘中进行检查和研究，并更新趋势。
摘要：Improper driving results in fatalities, damages, increased energy consumptions, and depreciation of the vehicles. Analyzing driving behaviour could lead to optimize and avoid mentioned issues. By identifying the type of driving and mapping them to the consequences of that type of driving, we can get a model to prevent them. In this regard, we try to create a dynamic survey paper to review and present driving behaviour survey data for future researchers in our research. By analyzing 58 articles, we attempt to classify standard methods and provide a framework for future articles to be examined and studied in different dashboards and updated about trends.

【5】 Visual Representation Learning for Preference-Aware Path Planning
标题：视觉表征学习在偏好感知路径规划中的应用
链接：https://arxiv.org/abs/2109.08968

作者：Kavan Singh Sikand,Sadegh Rabiee,Adam Uccello,Xuesu Xiao,Garrett Warnell,Joydeep Biswas
机构： 1 The authors are with the Department of Computer Science, TheUniversity of Texas at Austin
备注：7 pages, 6 figures
摘要：部署在室外环境中的自主移动机器人必须考虑不同类型的地形，以确保安全（例如，喜欢泥土而不是泥土）和部署人员的偏好（例如，喜欢泥土路径而不是花坛）。针对这种偏好感知路径规划问题的大多数现有解决方案都使用语义分割从摄像机图像中对地形类型进行分类，然后将成本归于每种类型。不幸的是，这种方法有三个关键限制——1）需要预先枚举离散地形类型，2）无法处理混合地形类型（例如，草地），以及3）需要昂贵的标记数据来训练视觉语义分割。我们将视觉表示学习引入偏好感知路径规划（VRL-PAP），一种克服所有三个限制的替代方法：VRL-PAP利用未标记的人类导航演示自动生成三元组，用于学习视点不变的地形视觉表示，并在连续表示空间中编码地形类型。然后将学习到的表示与相同的未标记人类导航演示一起使用，以学习从表示空间到地形的映射。在运行时，VRL-PAP将图像映射到表示，然后将表示映射到成本，以执行偏好感知路径规划。我们展示了具有挑战性的室外环境的经验结果，证明VRL-PAP 1）能够成功地选择反映所演示偏好的路径，2）在执行过程中与具有高度详细的手动注释地图的几何导航相当（无需此类注释），3）能够以最少的额外未标记演示推广到新的地形类型。
摘要：Autonomous mobile robots deployed in outdoor environments must reason about different types of terrain for both safety (e.g., prefer dirt over mud) and deployer preferences (e.g., prefer dirt path over flower beds). Most existing solutions to this preference-aware path planning problem use semantic segmentation to classify terrain types from camera images, and then ascribe costs to each type. Unfortunately, there are three key limitations of such approaches -- they 1) require pre-enumeration of the discrete terrain types, 2) are unable to handle hybrid terrain types (e.g., grassy dirt), and 3) require expensive labelled data to train visual semantic segmentation. We introduce Visual Representation Learning for Preference-Aware Path Planning (VRL-PAP), an alternative approach that overcomes all three limitations: VRL-PAP leverages unlabeled human demonstrations of navigation to autonomously generate triplets for learning visual representations of terrain that are viewpoint invariant and encode terrain types in a continuous representation space. The learned representations are then used along with the same unlabeled human navigation demonstrations to learn a mapping from the representation space to terrain costs. At run time, VRL-PAP maps from images to representations and then representations to costs to perform preference-aware path planning. We present empirical results from challenging outdoor settings that demonstrate VRL-PAP 1) is successfully able to pick paths that reflect demonstrated preferences, 2) is comparable in execution to geometric navigation with a highly detailed manually annotated map (without requiring such annotations), 3) is able to generalize to novel terrain types with minimal additional unlabeled demonstrations.

点云|SLAM|雷达|激光|深度RGBD相关(2篇)

【1】 Impact of Surface and Pore Characteristics on Fatigue Life of Laser Powder Bed Fusion Ti-6Al-4V Alloy Described by Neural Network Models
标题：表面和孔隙特性对激光粉床熔化Ti-6Al-4V合金疲劳寿命影响的神经网络模型
链接：https://arxiv.org/abs/2109.09655

作者：Seunghyun Moon,Ruimin Ma,Ross Attardo,Charles Tomonto,Mark Nordin,Paul Wheelock,Michael Glavicic,Maxwell Layman,Richard Billo,Tengfei Luo
机构：. Department of Aerospace and Mechanical Engineering, University of Notre Dame, IN , . ,D Printing Center, Johnson & Johnson, Miami, FL , . Rolls-Royce Corporation, S. Meridian St., Indianapolis, IN
摘要：在本研究中，研究了表面粗糙度和孔隙特征对激光粉末床熔合（LPBF）Ti-6Al-4V零件疲劳寿命的影响。197根疲劳条使用相同的激光功率打印，但扫描速度不同。这些作用导致了微尺度孔隙几何结构的变化，这种变化通过显微计算机断层扫描进行表征。为了在疲劳条中产生表面粗糙度的差异，一半样品进行喷砂处理，另一半进行机加工。从表面粗糙度和气孔统计数据方面分析了疲劳行为。对于喷砂样品，LPBF策略中的轮廓激光扫描导致孔隙耗尽区隔离具有不同特征的表面和内部孔隙。对于表面孔隙类似于内部孔隙的机加工试样，疲劳寿命与垂直于应力方向的平面上的平均孔径和投影孔隙面积高度相关。最后，采用了一种基于脱落神经网络（DONN）的机器学习模型，建立了表面和孔隙特征与疲劳数据（logN）之间的联系，并证明了良好的预测精度。除了预测疲劳寿命外，DONN还可以估计预测不确定性。
摘要：In this study, the effects of surface roughness and pore characteristics on fatigue lives of laser powder bed fusion (LPBF) Ti-6Al-4V parts were investigated. The 197 fatigue bars were printed using the same laser power but with varied scanning speeds. These actions led to variations in the geometries of microscale pores, and such variations were characterized using micro-computed tomography. To generate differences in surface roughness in fatigue bars, half of the samples were grit-blasted and the other half machined. Fatigue behaviors were analyzed with respect to surface roughness and statistics of the pores. For the grit-blasted samples, the contour laser scan in the LPBF strategy led to a pore-depletion zone isolating surface and internal pores with different features. For the machined samples, where surface pores resemble internal pores, the fatigue life was highly correlated with the average pore size and projected pore area in the plane perpendicular to the stress direction. Finally, a machine learning model using a drop-out neural network (DONN) was employed to establish a link between surface and pore features to the fatigue data (logN), and good prediction accuracy was demonstrated. Besides predicting fatigue lives, the DONN can also estimate the prediction uncertainty.

【2】 RibSeg Dataset and Strong Point Cloud Baselines for Rib Segmentation from CT Scans
标题：用于CT肋骨分割的RibSeg数据集和强点云基线
链接：https://arxiv.org/abs/2109.09521

作者：Jiancheng Yang,Shixuan Gu,Donglai Wei,Hanspeter Pfister,Bingbing Ni
机构： Shanghai Jiao Tong University, Shanghai, China, Dianei Technology, Shanghai, China, Harvard University, Cambridge MA, USA
备注：MICCAI 2021. The dataset, code, and model are available at this https URL
摘要：计算机断层扫描（CT）扫描中的手动肋骨检查在临床上至关重要，但需要大量劳动，因为在3D体积中，24根肋骨通常是拉长和倾斜的。自动肋骨分割方法可以通过肋骨测量和可视化来加速这一过程。然而，现有技术大多使用内部标记的数据集，这些数据集公开不可用，并且工作在计算效率低下的密集3D体积上。为了解决这些问题，我们开发了一个名为\emph{RibSeg}的标记肋骨分割基准，包括来自公共数据集的490个CT扫描（11719个肋骨）。对于地面真值生成，我们使用现有的基于形态学的算法，并手动细化其结果。然后，考虑到三维体中肋骨的稀疏性，我们对输入的稀疏体素进行阈值化和采样，并设计了一种基于点云的基线肋骨分割方法。所提出的方法实现了最先进的分割性能（Dice$\approx95\%$），具有显著的效率（比现有技术快10\sim40\倍）。PyTorch中的RibSeg数据集、代码和模型可在https://github.com/M3DV/RibSeg.
摘要：Manual rib inspections in computed tomography (CT) scans are clinically critical but labor-intensive, as 24 ribs are typically elongated and oblique in 3D volumes. Automatic rib segmentation methods can speed up the process through rib measurement and visualization. However, prior arts mostly use in-house labeled datasets that are publicly unavailable and work on dense 3D volumes that are computationally inefficient. To address these issues, we develop a labeled rib segmentation benchmark, named \emph{RibSeg}, including 490 CT scans (11,719 individual ribs) from a public dataset. For ground truth generation, we used existing morphology-based algorithms and manually refined its results. Then, considering the sparsity of ribs in 3D volumes, we thresholded and sampled sparse voxels from the input and designed a point cloud-based baseline method for rib segmentation. The proposed method achieves state-of-the-art segmentation performance (Dice~$\approx95\%$) with significant efficiency ($10\sim40\times$ faster than prior arts). The RibSeg dataset, code, and model in PyTorch are available at https://github.com/M3DV/RibSeg.

联邦学习|隐私保护|加密(2篇)

【1】 Improving Fairness for Data Valuation in Federated Learning
标题：提高联合学习中数据评估的公平性
链接：https://arxiv.org/abs/2109.09046

作者：Zhenan Fan,Huang Fang,Zirui Zhou,Jian Pei,Michael P. Friedlander,Changxin Liu,Yong Zhang
摘要：联合学习是一种新兴的分散式机器学习方案，允许多个数据所有者协作工作，同时确保数据隐私。联合学习的成功在很大程度上取决于数据所有者的参与。为了维持和鼓励数据所有者的参与，公平评估数据所有者提供的数据的质量并给予相应的奖励至关重要。Wang等人最近提出的联邦Shapley值。[联邦学习，2020年]，是联邦学习框架下的数据价值度量，满足数据估值的许多期望属性。然而，在联邦Shapley值的设计中仍然存在潜在的不公平因素，因为具有相同本地数据的两个数据所有者可能不会得到相同的评估。为了提高联邦Shapley值的公平性，我们提出了一种新的度量方法&完全联邦Shapley值。设计取决于完成一个矩阵，该矩阵由数据所有者的不同子集的所有可能贡献组成。在温和的条件下，通过利用优化的概念和工具，该矩阵的秩大约较低。理论分析和实证评估都证明，在许多情况下，拟议的措施确实提高了公平性。
摘要：Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners' participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.

【2】 Toward Efficient Federated Learning in Multi-Channeled Mobile Edge Network with Layerd Gradient Compression
标题：分层梯度压缩多信道移动边缘网络中高效联邦学习的研究
链接：https://arxiv.org/abs/2109.08819

作者：Haizhou Du,Xiaojie Feng,Qiao Xiang,Haoyu Liu
机构：Shanghai University of Electric Power, Xiamen University
备注：12 pages, 16 figures
摘要：联邦学习（FL）的一个基本问题是如何在高度动态的通信环境下实现最佳的模型性能。现代边缘设备通常可以通过多个通信信道（例如，4G、LTE和5G）连接到边缘FL服务器，这一事实可以缓解此问题。但是，让边缘设备沿多个通道向FL服务器发送本地模型的副本是冗余的、耗时的，并且会浪费资源（例如带宽、电池寿命和金钱成本）。在本文中，基于视频流中的分层编码技术，我们提出了一种新的FL框架，称为分层梯度压缩（LGC）。具体地说，在LGC中，来自设备的局部梯度被编码成若干层，并且每一层被沿着不同的信道发送到FL服务器。FL服务器聚合从设备接收的局部梯度层以更新全局模型，并将结果发送回设备。我们证明了LGC的收敛性，并用LGC形式化地定义了资源高效的联合学习问题。然后，我们为每个设备提出了一种基于学习的算法，以便在每次迭代中动态调整其局部计算（即局部随机下降的数量）和通信决策（即不同层的压缩级别和层到信道的映射）。大量实验的结果表明，与已知的FL机制相比，使用我们的算法，LGC显著减少了训练时间，提高了资源利用率，同时实现了相似的精度。
摘要：A fundamental issue for federated learning (FL) is how to achieve optimal model performance under highly dynamic communication environments. This issue can be alleviated by the fact that modern edge devices usually can connect to the edge FL server via multiple communication channels (e.g., 4G, LTE and 5G). However, having an edge device send copies of local models to the FL server along multiple channels is redundant, time-consuming, and would waste resources (e.g., bandwidth, battery life and monetary cost). In this paper, motivated by the layered coding techniques in video streaming, we propose a novel FL framework called layered gradient compression (LGC). Specifically, in LGC, local gradients from a device is coded into several layers and each layer is sent to the FL server along a different channel. The FL server aggregates the received layers of local gradients from devices to update the global model, and sends the result back to the devices. We prove the convergence of LGC, and formally define the problem of resource-efficient federated learning with LGC. We then propose a learning based algorithm for each device to dynamically adjust its local computation (i.e., the number of local stochastic descent) and communication decisions (i.e.,the compression level of different layers and the layer to channel mapping) in each iteration. Results from extensive experiments show that using our algorithm, LGC significantly reduces the training time, improves the resource utilization, while achieving a similar accuracy, compared with well-known FL mechanisms.

推理|分析|理解|解释(7篇)

【1】 Decoupling Long- and Short-Term Patterns in Spatiotemporal Inference
标题：时空推理中长、短时模式的解耦
链接：https://arxiv.org/abs/2109.09506

作者：Junfeng Hu,Yuxuan Liang,Zhencheng Fan,Yifang Yin,Ying Zhang,Roger Zimmermann
机构：School of Computing, National University of Singapore, Singapore, University of Technology Sydney, Australia, School of Computer Science, Northwestern Polytechnical University, China
摘要：传感器是感知环境并在许多方面为智能城市带来好处的关键，例如提供整个城市区域的实时空气质量信息。然而，先决条件是获得环境的细粒度知识。由于不可忽略的费用，物理世界中可以安装多少传感器是有限制的。在本文中，我们建议根据可用传感器的历史和当前观测推断城市中任何给定位置的实时信息（称为时空推断）。我们的方法将短期和长期模式的建模解耦，依赖于两个主要组件。首先，与以往分离时空关系学习的研究不同，我们引入了一个联合时空图注意网络来学习跨时空维度的短期依赖关系。其次，我们提出了一种具有时间跳跃的自适应图递归网络来捕获长期模式。自适应邻接矩阵首先作为递归网络的输入进行归纳学习，以学习动态依赖关系。在四个公共阅读世界数据集上的实验结果表明，我们的方法将最新的基线平均绝对误差降低了5%~12%。
摘要：Sensors are the key to sensing the environment and imparting benefits to smart cities in many aspects, such as providing real-time air quality information throughout an urban area. However, a prerequisite is to obtain fine-grained knowledge of the environment. There is a limit to how many sensors can be installed in the physical world due to non-negligible expenses. In this paper, we propose to infer real-time information of any given location in a city based on historical and current observations from the available sensors (termed spatiotemporal inference). Our approach decouples the modeling of short-term and long-term patterns, relying on two major components. Firstly, unlike previous studies that separated the spatial and temporal relation learning, we introduce a joint spatiotemporal graph attention network that learns the short-term dependencies across both the spatial and temporal dimensions. Secondly, we propose an adaptive graph recurrent network with a time skip for capturing long-term patterns. The adaptive adjacency matrices are learned inductively first as the inputs of a recurrent network to learn dynamic dependencies. Experimental results on four public read-world datasets show that our method reduces state-of-the-art baseline mean absolute errors by 5%~12%.

【2】 Machine learning methods for modelling and analysis of time series signals in geoinformatics
标题：地理信息学中时间序列信号建模与分析的机器学习方法
链接：https://arxiv.org/abs/2109.09499

作者：Maria Kaselimi
机构：Dipl. Rural and Surveying Engineer, NTUA, Supervisor: NIKOLAOS DOULAMIS, Assoc. Professor, NTUA
备注：arXiv admin note: text overlap with arXiv:2004.13408 by other authors
摘要：本文通过对比分析，评估了几种深度学习（DL）体系结构在大量不同性质和不同应用的时间序列数据集上的性能。本文讨论了两个主要的富有成果的研究领域，这两个领域的战略选择是为了解决当前吸引大地测量界兴趣的跨学科研究重点。第一个问题涉及电离层总电子含量（TEC）建模，这是许多实时全球导航系统卫星（GNSS）应用中的一个重要问题。关于电离层变化的可靠和快速的知识变得越来越重要。单频接收机和卫星导航系统的全球导航卫星系统用户需要精确的校正，以消除电离层造成的信号退化影响。利用信号处理技术进行电离层建模是本论文讨论的主题。讨论的下一个问题是能源分解，这是能源效率和能源消费意识的一个重要问题。可靠、快速地了解家用电器层面的住宅能耗如今变得越来越重要，这是防止能源浪费的重要缓解措施。能量分解或非侵入性负载监测（NILM）是一个单通道盲源分离问题，其任务是在给定总能量消耗的情况下估计每个电器的消耗。对于这两个问题，提出了各种深度学习模型（DL），涵盖了所研究问题的各个方面，而实验结果表明，与现有技术相比，所提出的方法具有优势。
摘要：In this dissertation is provided a comparative analysis that evaluates the performance of several deep learning (DL) architectures on a large number of time series datasets of different nature and for different applications. Two main fruitful research fields are discussed here which were strategically chosen in order to address current cross disciplinary research priorities attracting the interest of geodetic community. The first problem is related to ionospheric Total Electron Content (TEC) modeling which is an important issue in many real time Global Navigation System Satellites (GNSS) applications. Reliable and fast knowledge about ionospheric variations becomes increasingly important. GNSS users of single frequency receivers and satellite navigation systems need accurate corrections to remove signal degradation effects caused by the ionosphere. Ionospheric modeling using signal processing techniques is the subject of discussion in the present contribution. The next problem under discussion is energy disaggregation which is an important issue for energy efficiency and energy consumption awareness. Reliable and fast knowledge about residential energy consumption at appliance level becomes increasingly important nowadays and it is an important mitigation measure to prevent energy wastage. Energy disaggregation or Nonintrusive load monitoring (NILM) is a single channel blind source separation problem where the task is to estimate the consumption of each electrical appliance given the total energy consumption. For both problems various deep learning models (DL) are proposed that cover various aspects of the problem under study, whereas experimental results indicate the proposed methods superiority compared to the current state of the art.

【3】 MM-Deacon: Multimodal molecular domain embedding analysis via contrastive learning
标题：MM-Deacon：基于对比学习的多模态分子域嵌入分析
链接：https://arxiv.org/abs/2109.08830

作者：Zhihui Guo,Pramod Kumar Sharma,Liang Du,Robin Abraham
机构：Microsoft Corporation, Redmond, WA
摘要：分子表征学习在化学信息学中起着至关重要的作用。最近，基于语言模型的方法作为传统的专家设计的分子编码特征的替代方法已经很流行。然而，这些方法仅利用单一模式来表示分子。由于给定的分子可以通过不同的模式来描述，例如简化分子线输入系统（SMILES）、国际纯化学和应用化学联合会（IUPAC）和IUPAC国际化学标识符（InChI），我们提出了一种称为MM-Deacon的多模式分子嵌入生成方法（通过对比学习进行多模式分子域嵌入分析）.MM Deacon使用SMILES和IUPAC分子表示作为两种不同的模式进行训练。首先，SMILES和IUPAC字符串分别使用两种不同的基于转换器的语言模型进行编码，然后利用对比损失使来自不同模式的编码表示彼此更接近，如果它们我们评估了我们的分子嵌入在分子聚类、跨模式分子搜索、药物相似性评估和药物相互作用任务中的稳健性。
摘要：Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have been popular as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single modality for representing molecules. Driven by the fact that a given molecule can be described through different modalities such as Simplified Molecular Line Entry System (SMILES), The International Union of Pure and Applied Chemistry (IUPAC), and The IUPAC International Chemical Identifier (InChI), we propose a multimodal molecular embedding generation approach called MM-Deacon (multimodal molecular domain embedding analysis via contrastive learning). MM-Deacon is trained using SMILES and IUPAC molecule representations as two different modalities. First, SMILES and IUPAC strings are encoded by using two different transformer-based language models independently, then the contrastive loss is utilized to bring these encoded representations from different modalities closer to each other if they belong to the same molecule, and to push embeddings farther from each other if they belong to different molecules. We evaluate the robustness of our molecule embeddings on molecule clustering, cross-modal molecule search, drug similarity assessment and drug-drug interaction tasks.

【4】 Probabilistic Inference of Simulation Parameters via Parallel Differentiable Simulation
标题：基于并行差分仿真的仿真参数概率推断
链接：https://arxiv.org/abs/2109.08815

作者：Eric Heiden,Christopher E. Denniston,David Millard,Fabio Ramos,Gaurav S. Sukhatme
机构： 1Department of Computer Science, University of Southern Cali-fornia
备注：Extended version. Submitted to ICRA 2022
摘要：为了准确地再现真实世界的测量结果，模拟器需要有一个适当的物理系统模型，并需要识别模型的参数。我们通过贝叶斯推理方法解决后一个参数估计问题，该方法近似于给定真实传感器测量的仿真参数的后验分布。通过多次射击公式扩展常用的轨迹高斯似然模型，我们选择的基于粒子的推理算法Stein变分梯度下降能够识别高度非线性欠驱动系统。我们利用GPU代码生成和可微模拟来评估并行多粒子的可能性及其梯度。我们的算法比可比较的基线更准确地推断模拟参数的非参数分布，并通过基于梯度的优化有效地处理参数约束。我们在几个物理实验中评估了估计性能。在欠驱动机构上，7自由度机械臂激励具有未知质量配置的对象，我们演示了我们的推理技术如何识别参数之间的对称性，并提供高度准确的预测。项目网站：https://uscresl.github.io/prob-diff-sim
摘要：To accurately reproduce measurements from the real world, simulators need to have an adequate model of the physical system and require the parameters of the model be identified. We address the latter problem of estimating parameters through a Bayesian inference approach that approximates a posterior distribution over simulation parameters given real sensor measurements. By extending the commonly used Gaussian likelihood model for trajectories via the multiple-shooting formulation, our chosen particle-based inference algorithm Stein Variational Gradient Descent is able to identify highly nonlinear, underactuated systems. We leverage GPU code generation and differentiable simulation to evaluate the likelihood and its gradient for many particles in parallel. Our algorithm infers non-parametric distributions over simulation parameters more accurately than comparable baselines and handles constraints over parameters efficiently through gradient-based optimization. We evaluate estimation performance on several physical experiments. On an underactuated mechanism where a 7-DOF robot arm excites an object with an unknown mass configuration, we demonstrate how our inference technique can identify symmetries between the parameters and provide highly accurate predictions. Project website: https://uscresl.github.io/prob-diff-sim

【5】 Understanding neural networks with reproducing kernel Banach spaces
标题：用再生核Banach空间理解神经网络
链接：https://arxiv.org/abs/2109.09710

作者：Francesca Bartolucci,Ernesto De Vito,Lorenzo Rosasco,Stefano Vigogna
摘要：描述神经网络对应的函数空间可以提供一种理解其性质的方法。在本文中，我们讨论了如何利用再生核Banach空间理论来解决这一难题。特别地，我们证明了一类广泛的再生核Banach空间的representer定理，该空间允许一个合适的积分表示，并且包含一个可能无限宽的隐层神经网络。此外，我们还证明了，对于一类合适的ReLU激活函数，相应的再生核Banach空间中的范数可以用有界实测度的逆Radon变换来刻画，范数由测度的总变分范数给出。我们的分析简化并扩展了[34,29,30]中的最新结果。
摘要：Characterizing the function spaces corresponding to neural networks can provide a way to understand their properties. In this paper we discuss how the theory of reproducing kernel Banach spaces can be used to tackle this challenge. In particular, we prove a representer theorem for a wide class of reproducing kernel Banach spaces that admit a suitable integral representation and include one hidden layer neural networks of possibly infinite width. Further, we show that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure, with norm given by the total variation norm of the measure. Our analysis simplifies and extends recent results in [34,29,30].

【6】 Machine Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis
标题：基于机器学习的大规模验证性项目因子分析的估计和拟合优度
链接：https://arxiv.org/abs/2109.09500

作者：Christopher J. Urban,Daniel J. Bauer
机构：L.L. Thurstone Psychometric Laboratory in the Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill
摘要：我们研究了新的参数估计和拟合优度（GOF）评估方法，用于大规模验证性项目因子分析（IFA），包括许多受访者、项目和潜在因素。对于参数估计，我们将Urban和Bauer（2021）的探索性IFA深度学习算法扩展到验证性设置，展示了如何处理用户定义的负荷和因子相关性约束。对于GOF评估，我们探索了新的基于模拟的测试和指标。特别地，我们考虑分类器的两个样本测试（C2ST）的扩展，该方法测试机器学习分类器是否能够区分所观察到的数据和从拟合的IFA模型采样的合成数据。C2ST提供了一个灵活的框架，集成了整体模型拟合、分段拟合和个人拟合。建议的扩展包括基于C2ST的近似拟合测试，其中用户指定观测数据与合成数据的百分比，以及基于C2ST的相对拟合指数，该指数与结构方程建模中使用的相对拟合指数在精神上类似。通过模拟研究，我们首先表明，随着样本量的增加，Urban和Bauer（2021）算法的验证性扩展产生了更精确的参数估计，并在更短的时间内获得了与最先进的验证性IFA估计程序相当的估计。接下来，我们展示了基于C2ST的近似拟合检验控制了经验I型错误率，并在潜在因素的数量被错误指定时进行检测。最后，我们实证研究了基于C2ST的相对拟合指数的抽样分布如何依赖于样本量。
摘要：We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle user-defined constraints on loadings and factor correlations. For GOF assessment, we explore new simulation-based tests and indices. In particular, we consider extensions of the classifier two-sample test (C2ST), a method that tests whether a machine learning classifier can distinguish between observed data and synthetic data sampled from a fitted IFA model. The C2ST provides a flexible framework that integrates overall model fit, piece-wise fit, and person fit. Proposed extensions include a C2ST-based test of approximate fit in which the user specifies what percentage of observed data can be distinguished from synthetic data as well as a C2ST-based relative fit index that is similar in spirit to the relative fit indices used in structural equation modeling. Via simulation studies, we first show that the confirmatory extension of Urban and Bauer's (2021) algorithm produces more accurate parameter estimates as the sample size increases and obtains comparable estimates to a state-of-the-art confirmatory IFA estimation procedure in less time. We next show that the C2ST-based test of approximate fit controls the empirical type I error rate and detects when the number of latent factors is misspecified. Finally, we empirically investigate how the sampling distribution of the C2ST-based relative fit index depends on the sample size.

【7】 Sibyl: Understanding and Addressing the Usability Challenges of Machine Learning In High-Stakes Decision Making
标题：Sibyl：在高风险决策中理解和解决机器学习的可用性挑战
链接：https://arxiv.org/abs/2103.02071

作者：Alexandra Zytek,Dongyu Liu,Rhema Vaithianathan,Kalyan Veeramachaneni
机构：• Rhema Vaithianathan is with Auckland University of Technology
备注：Updated to version presented at VIS 2021
摘要：机器学习（ML）正被应用于一系列不同且不断增长的领域。在许多情况下，领域专家——他们通常没有ML或数据科学方面的专业知识——被要求使用ML预测来做出高风险的决策。结果可能会出现多个ML可用性挑战，例如用户对模型缺乏信任，无法协调人类的ML分歧，以及对将复杂问题过度简化为单个算法输出的道德担忧。在本文中，我们通过与儿童福利筛查者的一系列合作，调查了儿童福利筛查领域中存在的ML可用性挑战。在ML科学家、可视化研究人员和领域专家（儿童筛查者）之间的迭代设计过程之后，我们首先确定了四个关键的ML挑战，并磨练出一种有前途的可解释的ML技术来解决它们（局部因素贡献）。然后，我们实施并评估了可视化分析工具Sibyl，以提高本地因素贡献的可解释性和交互性。我们的工具的有效性通过两项正式的用户研究得到证明，这两项研究分别有12名非专家参与者和13名专家参与者。我们收集了宝贵的反馈，从中我们列出了一个设计含义列表，作为研究人员的有用指南，这些研究人员旨在为儿童福利筛查者和其他类似领域专家部署的ML预测模型开发一个可解释的交互式可视化工具。
摘要：Machine learning (ML) is being applied to a diverse and ever-growing set of domains. In many cases, domain experts - who often have no expertise in ML or data science - are asked to use ML predictions to make high-stakes decisions. Multiple ML usability challenges can appear as result, such as lack of user trust in the model, inability to reconcile human-ML disagreement, and ethical concerns about oversimplification of complex problems to a single algorithm output. In this paper, we investigate the ML usability challenges that present in the domain of child welfare screening through a series of collaborations with child welfare screeners. Following the iterative design process between the ML scientists, visualization researchers, and domain experts (child screeners), we first identified four key ML challenges and honed in on one promising explainable ML technique to address them (local factor contributions). Then we implemented and evaluated our visual analytics tool, Sibyl, to increase the interpretability and interactivity of local factor contributions. The effectiveness of our tool is demonstrated by two formal user studies with 12 non-expert participants and 13 expert participants respectively. Valuable feedback was collected, from which we composed a list of design implications as a useful guideline for researchers who aim to develop an interpretable and interactive visualization tool for ML prediction models deployed for child welfare screeners and other similar domain experts.

检测相关(7篇)

【1】 A2Log: Attentive Augmented Log Anomaly Detection
标题：A2Log：注意力增强的日志异常检测
链接：https://arxiv.org/abs/2109.09537

作者：Thorsten Wittkopp,Alexander Acker,Sasho Nedelkoski,Jasmin Bogatinovski,Dominik Scheinert,Wu Fan,Odej Kao
机构： Wu Fan 2 and Odej Kao 1 1Technische Universit¨at Berlin
备注：This paper has been accepted for HICSS 2022 and will appear in the conference proceedings
摘要：异常检测对于IT服务的可靠性和可服务性变得越来越重要。由于日志行记录IT服务执行期间的事件，因此它们是诊断的主要来源。因此，无监督方法提供了显著的好处，因为并非所有异常都能在训练时被知道。现有的无监督方法需要异常实例来获得异常检测任务所需的合适决策边界。这一要求带来了实际限制。因此，我们开发了A2Log，这是一种无监督的异常检测方法，包括两个步骤：异常评分和异常决策。首先，我们利用自注意神经网络对每条日志消息进行评分。其次，我们根据可用的正常训练数据的数据扩充来设置决策边界。该方法在三个公开数据集和一个行业数据集上进行了评估。我们表明，我们的方法优于现有的方法。此外，我们利用可用的异常示例设置最佳决策边界以获得强基线。我们证明了我们的方法，即在不使用异常示例的情况下确定决策边界，可以达到强基线的分数。
摘要：Anomaly detection becomes increasingly important for the dependability and serviceability of IT services. As log lines record events during the execution of IT services, they are a primary source for diagnostics. Thereby, unsupervised methods provide a significant benefit since not all anomalies can be known at training time. Existing unsupervised methods need anomaly examples to obtain a suitable decision boundary required for the anomaly detection task. This requirement poses practical limitations. Therefore, we develop A2Log, which is an unsupervised anomaly detection method consisting of two steps: Anomaly scoring and anomaly decision. First, we utilize a self-attention neural network to perform the scoring for each log message. Second, we set the decision boundary based on data augmentation of the available normal training data. The method is evaluated on three publicly available datasets and one industry dataset. We show that our approach outperforms existing methods. Furthermore, we utilize available anomaly examples to set optimal decision boundaries to acquire strong baselines. We show that our approach, which determines decision boundaries without utilizing anomaly examples, can reach scores of the strong baselines.

【2】 Anomaly Detection in Radar Data Using PointNets
标题：基于点网的雷达数据异常检测
链接：https://arxiv.org/abs/2109.09401

作者：Thomas Griebel,Dominik Authaler,Markus Horn,Matti Henning,Michael Buchholz,Klaus Dietmayer
机构： A similar task is approached in [ 2]All authors are with the Institute of Measurement, Ulm University
备注：Accepted for presentation at the 2021 IEEE 24th International Conference on Intelligent Transportation Systems (ITSC), September 19-22, 2021, Indianapolis, USA
摘要：对于自动驾驶，雷达是一种重要的传感器类型。一方面，雷达可以直接测量环境中目标的径向速度。另一方面，在文献中，雷达传感器因其对几种恶劣天气条件的鲁棒性而闻名。然而，不利的一面是，雷达容易受到鬼目标或杂波的影响，这些鬼目标或杂波可能是由多种不同的原因造成的，例如，环境中的反射表面。例如，重影目标可能导致错误的目标检测。为此，希望尽早在雷达数据中识别异常目标。在这项工作中，我们提出了一种基于点网的异常雷达目标检测方法。通过修改由我们的任务驱动的PointNet体系结构，我们开发了一种新的分组变体，它有助于多形式分组模块。我们的方法在城市场景中的真实数据集上进行了评估，显示了检测异常雷达目标的良好结果。
摘要：For autonomous driving, radar is an important sensor type. On the one hand, radar offers a direct measurement of the radial velocity of targets in the environment. On the other hand, in literature, radar sensors are known for their robustness against several kinds of adverse weather conditions. However, on the downside, radar is susceptible to ghost targets or clutter which can be caused by several different causes, e.g., reflective surfaces in the environment. Ghost targets, for instance, can result in erroneous object detections. To this end, it is desirable to identify anomalous targets as early as possible in radar data. In this work, we present an approach based on PointNets to detect anomalous radar targets. Modifying the PointNet-architecture driven by our task, we developed a novel grouping variant which contributes to a multi-form grouping module. Our method is evaluated on a real-world dataset in urban scenarios and shows promising results for the detection of anomalous radar targets.

【3】 Deep Spatio-temporal Sparse Decomposition for Trend Prediction and Anomaly Detection in Cardiac Electrical Conduction
标题：用于心电传导趋势预测和异常检测的深时空稀疏分解
链接：https://arxiv.org/abs/2109.09317

作者：Xinyu Zhao,Hao Yan,Zhiyong Hu,Dongping Du
机构：†Xinyu Zhao and Hao Yan 1 are with School of Computing, Arizona StateUniversity
摘要：心脏组织间的电传导通常用偏微分方程建模，即反应扩散方程，其中反应项描述细胞刺激，扩散项描述电传播。在这种非线性动态系统中，检测和识别产生异常电脉冲的心肌细胞对于有效治疗和规划非常重要。为了建立非线性动力学模型，模拟已广泛应用于心脏研究和临床研究，以研究心脏疾病的机制和开发新的治疗设计。然而，现有的心脏模型具有很高的复杂度，并且仿真往往非常耗时。我们提出了一种深度时空稀疏分解（DSTSD）方法，利用深度时空模型绕过耗时的心脏偏微分方程，并检测异常（即，故障心脏细胞）的时间和位置。该方法通过从Courtmanche-Ramirez-Nattel（CRN）模型生成的数据集得到验证，该模型广泛用于模拟跨神经元膜跨膜电位的传播。提出的DSTSD在时空平均趋势预测和异常检测方面达到了最佳精度。
摘要：Electrical conduction among cardiac tissue is commonly modeled with partial differential equations, i.e., reaction-diffusion equation, where the reaction term describes cellular stimulation and diffusion term describes electrical propagation. Detecting and identifying of cardiac cells that produce abnormal electrical impulses in such nonlinear dynamic systems are important for efficient treatment and planning. To model the nonlinear dynamics, simulation has been widely used in both cardiac research and clinical study to investigate cardiac disease mechanisms and develop new treatment designs. However, existing cardiac models have a great level of complexity, and the simulation is often time-consuming. We propose a deep spatio-temporal sparse decomposition (DSTSD) approach to bypass the time-consuming cardiac partial differential equations with the deep spatio-temporal model and detect the time and location of the anomaly (i.e., malfunctioning cardiac cells). This approach is validated from the data set generated from the Courtemanche-Ramirez-Nattel (CRN) model, which is widely used to model the propagation of the transmembrane potential across the cross neuron membrane. The proposed DSTSD achieved the best accuracy in terms of spatio-temporal mean trend prediction and anomaly detection.

【4】 Unified and Multilingual Author Profiling for Detecting Haters
标题：用于检测仇恨的统一的多语种作者侧写
链接：https://arxiv.org/abs/2109.09233

作者：Ipek Baris Schlicht,Angel Felipe Magnossão de Paula
机构：Universitat Politècnica de València, Spain
备注：None
摘要：本文提出了一个统一的用户分析框架，通过处理他们的推文来识别仇恨言论传播者，而不考虑语言。该框架使用句子转换器对tweet进行编码，并应用注意机制来选择重要tweet以学习用户配置文件。此外，注意层通过在标记和帖子级别上产生注意权重，有助于解释为什么用户是仇恨言论传播者。我们提出的模型优于最先进的多语言Transformer模型。
摘要：This paper presents a unified user profiling framework to identify hate speech spreaders by processing their tweets regardless of the language. The framework encodes the tweets with sentence transformers and applies an attention mechanism to select important tweets for learning user profiles. Furthermore, the attention layer helps to explain why a user is a hate speech spreader by producing attention weights at both token and post level. Our proposed model outperformed the state-of-the-art multilingual transformer models.

【5】 The Devil Is in the Details: An Efficient Convolutional Neural Network for Transport Mode Detection
标题：魔鬼在细节：一种用于交通方式检测的高效卷积神经网络
链接：https://arxiv.org/abs/2109.09504

作者：Hugues Moreau,Andréa Vassilev,Liming Chen
备注：None
摘要：传输模式检测是一个分类问题，旨在设计一种算法，可以推断用户给定的多模信号（GPS和/或惯性传感器）的传输模式。它有许多应用，如碳足迹跟踪、移动行为分析或实时门到门智能规划。当前大多数方法都依赖于使用机器学习技术的分类步骤，并且，与许多其他分类问题一样，深度学习方法通常比使用手工特征的传统机器学习方法取得更好的结果。然而，深度模型有一个明显的缺点：它们通常很重，无论是在内存空间还是处理成本方面。我们证明了一个小的、优化的模型可以和当前的deep模型表现得一样好。在我们对GeoLife和SHL 2018数据集的实验过程中，我们获得了具有数万个参数的模型，即比最先进的网络的参数和操作少10到1000倍，这些网络仍然达到了可比的性能。我们还使用上述数据集表明，当前用于处理不同长度信号的预处理是次优的，并且我们提供了更好的替换。最后，我们介绍了一种使用较轻卷积神经网络来使用不同长度的信号的方法，而不使用较重的递归神经网络。
摘要：Transport mode detection is a classification problem aiming to design an algorithm that can infer the transport mode of a user given multimodal signals (GPS and/or inertial sensors). It has many applications, such as carbon footprint tracking, mobility behaviour analysis, or real-time door-to-door smart planning. Most current approaches rely on a classification step using Machine Learning techniques, and, like in many other classification problems, deep learning approaches usually achieve better results than traditional machine learning ones using handcrafted features. Deep models, however, have a notable downside: they are usually heavy, both in terms of memory space and processing cost. We show that a small, optimized model can perform as well as a current deep model. During our experiments on the GeoLife and SHL 2018 datasets, we obtain models with tens of thousands of parameters, that is, 10 to 1,000 times less parameters and operations than networks from the state of the art, which still reach a comparable performance. We also show, using the aforementioned datasets, that the current preprocessing used to deal with signals of different lengths is suboptimal, and we provide better replacements. Finally, we introduce a way to use signals with different lengths with the lighter Convolutional neural networks, without using the heavier Recurrent Neural Networks.

【6】 DECORAS: detection and characterization of radio-astronomical sources using deep learning
标题：DECORAS：利用深度学习探测和表征射电天文信号源
链接：https://arxiv.org/abs/2109.09077

作者：S. Rezaei,J. P. McKean,M. Biehl,A. Javadpour
机构：Kapteyn Astronomical Institute, University of Groningen, Postbus , NL-, AV Groningen, the Netherlands, Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Postbus , NL-, AK Groningen, the Netherlands
摘要：我们介绍DECORAS，一种基于深度学习的方法，用于从甚长基线干涉测量（VLBI）观测中检测点源和扩展源。我们的方法基于编码器-解码器神经网络架构，该架构使用少量卷积层来提供源检测的可伸缩解决方案。此外，DECORAS还根据检测到的光源的位置、有效半径和峰值亮度对光源进行表征。我们使用基于20厘米处真实超长基线阵列（VLBA）观测的图像对网络进行了训练和测试。此外，这些图像没有经过任何先前的去卷积步骤，并且通过傅里叶变换与可见性数据直接相关。我们发现，与传统的源检测算法相比，DECORAS生成的源目录具有更好的整体完整性和纯度。装饰在7.5$\sigma$水平上是完整的，在5.5$\sigma$水平上，可靠性几乎提高了两倍。我们发现DECORAS可以将检测到的源的位置恢复到0.61$\pm$0.69 mas以内，98%和94%的源的有效半径和峰值表面亮度分别恢复到20%以内。总之，我们发现DECORAS为未来的宽场VLBI测量提供了可靠的源检测和表征解决方案。
摘要：We present DECORAS, a deep learning based approach to detect both point and extended sources from Very Long Baseline Interferometry (VLBI) observations. Our approach is based on an encoder-decoder neural network architecture that uses a low number of convolutional layers to provide a scalable solution for source detection. In addition, DECORAS performs source characterization in terms of the position, effective radius and peak brightness of the detected sources. We have trained and tested the network with images that are based on realistic Very Long Baseline Array (VLBA) observations at 20 cm. Also, these images have not gone through any prior de-convolution step and are directly related to the visibility data via a Fourier transform. We find that the source catalog generated by DECORAS has a better overall completeness and purity, when compared to a traditional source detection algorithm. DECORAS is complete at the 7.5$\sigma$ level, and has an almost factor of two improvement in reliability at 5.5$\sigma$. We find that DECORAS can recover the position of the detected sources to within 0.61 $\pm$ 0.69 mas, and the effective radius and peak surface brightness are recovered to within 20 per cent for 98 and 94 per cent of the sources, respectively. Overall, we find that DECORAS provides a reliable source detection and characterization solution for future wide-field VLBI surveys.

【7】 Asymmetric 3D Context Fusion for Universal Lesion Detection
标题：用于通用病变检测的非对称三维上下文融合
链接：https://arxiv.org/abs/2109.08684

作者：Jiancheng Yang,Yi He,Kaiming Kuang,Zudi Lin,Hanspeter Pfister,Bingbing Ni
机构： Shanghai Jiao Tong University, Shanghai, China, Dianei Technology, Shanghai, China, Harvard University, Cambridge MA, USA
备注：MICCAI 2021. The code and model are available at this https URL
摘要：三维环境建模是高性能三维医学图像分析的关键。虽然2D网络受益于大规模2D监督预训练，但它在捕获3D上下文方面较弱。3D网络在3D环境中很强大，但缺乏有监督的预训练。作为一种新兴技术，emph{3D context fusion operator}能够从2D预训练网络进行转换，它充分利用了两者的优点，并取得了巨大的成功。现有的3D上下文融合算子被设计成空间对称的，即，在每个2D切片上执行相同的操作。然而，这些操作符并不真正等同于平移，特别是当只有几个3D切片用作输入时。在本文中，我们提出了一种新的非对称3D上下文融合算子（A3D），该算子使用不同的权重来融合来自不同2D切片的3D上下文。值得注意的是，A3D不是平移等变的，但它在不引入大量计算开销的情况下显著优于现有的对称上下文融合算子。我们在DeepDescence benchmark上进行了大量实验，验证了该方法的有效性。DeepDescence benchmark是一个大型公共数据集，用于从计算机断层扫描（CT）中进行通用病变检测。所提出的A3D始终比对称上下文融合算子有相当大的优势，并建立了一种新的深度融合算法。为了促进开放式研究，我们在PyTorch中的代码和模型可在https://github.com/M3DV/AlignShift.
摘要：Modeling 3D context is essential for high-performance 3D medical image analysis. Although 2D networks benefit from large-scale 2D supervised pretraining, it is weak in capturing 3D context. 3D networks are strong in 3D context yet lack supervised pretraining. As an emerging technique, \emph{3D context fusion operator}, which enables conversion from 2D pretrained networks, leverages the advantages of both and has achieved great success. Existing 3D context fusion operators are designed to be spatially symmetric, i.e., performing identical operations on each 2D slice like convolutions. However, these operators are not truly equivariant to translation, especially when only a few 3D slices are used as inputs. In this paper, we propose a novel asymmetric 3D context fusion operator (A3D), which uses different weights to fuse 3D context from different 2D slices. Notably, A3D is NOT translation-equivariant while it significantly outperforms existing symmetric context fusion operators without introducing large computational overhead. We validate the effectiveness of the proposed method by extensive experiments on DeepLesion benchmark, a large-scale public dataset for universal lesion detection from computed tomography (CT). The proposed A3D consistently outperforms symmetric context fusion operators by considerable margins, and establishes a new \emph{state of the art} on DeepLesion. To facilitate open research, our code and model in PyTorch are available at https://github.com/M3DV/AlignShift.

分类|识别(8篇)

【1】 Contrastive Learning of Subject-Invariant EEG Representations for Cross-Subject Emotion Recognition
标题：用于跨主体情绪识别的不变脑电信号表征的对比学习
链接：https://arxiv.org/abs/2109.09559

作者：Xinke Shen,Xianggen Liu,Xin Hu,Dan Zhang,Sen Song
备注：17 pages, 14 figures, journal paper
摘要：情感识别在人机交互和日常医疗保健中起着至关重要的作用。近年来，脑电信号被认为是信息丰富、可靠的情绪识别信号。然而，情绪相关脑电信号的主体间变异性对基于脑电的情绪识别的实际应用提出了巨大挑战。受最近关于主体间相关性的神经科学研究的启发，我们提出了一种用于可靠跨主体情绪识别的主体间比对学习方法（CLISA）。对比学习通过最大化受试者在接受相同刺激和不同刺激时脑电信号的相似性来最小化受试者之间的差异。具体地说，一个具有深度空间卷积和时间卷积层的卷积神经网络被用于从原始脑电信号中学习对象间对齐的时空表示。然后利用对齐表示提取差分熵特征进行情感分类。在80名受试者的THU-EP数据集和15名受试者的公开SEED数据集上评估了该方法的性能。与最先进的方法相比，获得了相当或更好的跨学科情绪识别准确率（即，在THU-EP数据集上，二分类和九分类的准确率分别为72.1%和47.0%，在SEED数据集上，三分类的准确率分别为86.3%）。所提出的方法也可以很好地推广到看不见的情绪刺激。因此，CLISA方法有望通过“即插即用”的方式大大提高基于EEG的情感识别的实用性。此外，CLISA学习到的时空表征可以深入了解人类情绪处理的神经机制。
摘要：Emotion recognition plays a vital role in human-machine interactions and daily healthcare. EEG signals have been reported to be informative and reliable for emotion recognition in recent years. However, the inter-subject variability of emotion-related EEG signals poses a great challenge for the practical use of EEG-based emotion recognition. Inspired by the recent neuroscience studies on inter-subject correlation, we proposed a Contrastive Learning method for Inter-Subject Alignment (CLISA) for reliable cross-subject emotion recognition. Contrastive learning was employed to minimize the inter-subject differences by maximizing the similarity in EEG signals across subjects when they received the same stimuli in contrast to different ones. Specifically, a convolutional neural network with depthwise spatial convolution and temporal convolution layers was applied to learn inter-subject aligned spatiotemporal representations from raw EEG signals. Then the aligned representations were used to extract differential entropy features for emotion classification. The performance of the proposed method was evaluated on our THU-EP dataset with 80 subjects and the publicly available SEED dataset with 15 subjects. Comparable or better cross-subject emotion recognition accuracy (i.e., 72.1% and 47.0% for binary and nine-class classification, respectively, on the THU-EP dataset and 86.3% on the SEED dataset for three-class classification) was achieved as compared to the state-of-the-art methods. The proposed method could be generalized well to unseen emotional stimuli as well. The CLISA method is therefore expected to considerably increase the practicality of EEG-based emotion recognition by operating in a "plug-and-play" manner. Furthermore, the learned spatiotemporal representations by CLISA could provide insights into the neural mechanisms of human emotion processing.

【2】 Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
标题：视听语音识别价值32$IMES$32$IMES$8体素
链接：https://arxiv.org/abs/2109.09536

作者：Dmitriy Serdyuk,Otavio Braga,Olivier Siohan
机构：Google,th Ave, New York, USA
备注：7 pages, 2 figures, 4 tables. A draft for a paper accepted to ASRU workshop
摘要：视听自动语音识别（AV-ASR）将视频模式引入语音识别过程，通常依靠说话人嘴的运动所传递的信息。使用视频信号需要提取视觉特征，然后将其与声学特征结合起来，构建AV-ASR系统[1]。这通常是通过计算机视觉领域广泛使用的某种形式的三维卷积网络（如VGG）来实现的。最近，图像转换器[2]已被引入，用于提取图像分类任务中有用的视觉特征。在这项工作中，我们建议用视频转换器前端取代三维卷积视觉前端。我们在由YouTube视频组成的大规模数据集上训练我们的系统，并在公开可用的LRS3-TED集以及大量YouTube视频集上评估性能。在唇读任务中，与强卷积基线相比，基于转换器的前端显示出优越的性能。在AV-ASR任务中，Transformer前端的性能与卷积基线相当（或优于卷积基线）。在LRS3-TED训练集上对我们的模型进行微调，使其与以前的技术水平相匹配。因此，我们通过实验证明了AV-ASR无卷积模型的可行性。
摘要：Audio-visual automatic speech recognition (AV-ASR) introduces the video modality into the speech recognition process, often by relying on information conveyed by the motion of the speaker's mouth. The use of the video signal requires extracting visual features, which are then combined with the acoustic features to build an AV-ASR system [1]. This is traditionally done with some form of 3D convolutional network (e.g. VGG) as widely used in the computer vision community. Recently, image transformers [2] have been introduced to extract visual features useful for image classification tasks. In this work, we propose to replace the 3D convolutional visual front-end with a video transformer front-end. We train our systems on a large-scale dataset composed of YouTube videos and evaluate performance on the publicly available LRS3-TED set, as well as on a large set of YouTube videos. On a lip-reading task, the transformer-based front-end shows superior performance compared to a strong convolutional baseline. On an AV-ASR task, the transformer front-end performs as well as (or better than) the convolutional baseline. Fine-tuning our model on the LRS3-TED training set matches previous state of the art. Thus, we experimentally show the viability of the convolution-free model for AV-ASR.

【3】 Incremental Learning Techniques for Online Human Activity Recognition
标题：用于在线人类活动识别的增量学习技术
链接：https://arxiv.org/abs/2109.09435

作者：Meysam Vakili,Masoumeh Rezaei
机构： Department of Computer Engineering, University of Science and Culture, Tehran, Iran, Department of Computer Engineering, Khorasan Institute of Higher Education, Razavi Khorasan, Iran
备注：16 pages, 5 figures, 7 tables
摘要：使用智能手机惯性传感器对人类活动进行不引人注目的智能识别是人工智能领域中一个有趣的话题，近年来在研究人员中获得了极大的普及。需要更多关注的一个相当大的挑战是实时检测身体活动，因为对于健康监测和老年护理等许多实际应用，需要立即识别用户的活动，以防止严重损害个人健康。在本文中，我们提出了一种用于在线预测身体运动的人类活动识别（HAR）方法，受益于增量学习算法的功能。我们开发了一个HAR系统，其中包含监控软件和一个移动应用程序，用于收集加速度计和陀螺仪数据，并通过互联网将其发送到远程服务器进行分类和识别操作。本文采用了六种增量学习算法并对其进行了评估，并与开发离线HAR系统常用的几种批量学习算法进行了比较。最后的结果表明，考虑到所有的性能评估指标，增量K近邻和增量朴素贝叶斯算法优于其他算法，实时识别准确率超过95%。
摘要：Unobtrusive and smart recognition of human activities using smartphones inertial sensors is an interesting topic in the field of artificial intelligence acquired tremendous popularity among researchers, especially in recent years. A considerable challenge that needs more attention is the real-time detection of physical activities, since for many real-world applications such as health monitoring and elderly care, it is required to recognize users' activities immediately to prevent severe damages to individuals' wellness. In this paper, we propose a human activity recognition (HAR) approach for the online prediction of physical movements, benefiting from the capabilities of incremental learning algorithms. We develop a HAR system containing monitoring software and a mobile application that collects accelerometer and gyroscope data and send them to a remote server via the Internet for classification and recognition operations. Six incremental learning algorithms are employed and evaluated in this work and compared with several batch learning algorithms commonly used for developing offline HAR systems. The Final results indicated that considering all performance evaluation metrics, Incremental K-Nearest Neighbors and Incremental Naive Bayesian outperformed other algorithms, exceeding a recognition accuracy of 95% in real-time.

【4】 Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition
标题：混合数据增强和基于深度注意力的扩张卷积-递归神经网络语音情感识别
链接：https://arxiv.org/abs/2109.09026

作者：Nhat Truong Pham,Duc Ngoc Minh Dang,Sy Dzung Nguyen
备注：12 pages, 16 figures, 6 tables
摘要：语音情感识别（SER）一直是人机交互（HCI）应用中的重要任务之一。然而，很难选择最佳特征并处理不平衡的标记数据。在本文中，我们研究了混合数据增强（HDA）方法来生成和平衡基于传统和生成性对抗网络（GAN）方法的数据。为了评估HDA方法的有效性，通过将深度扩展卷积递归神经网络与注意机制相结合，设计了一个深度学习框架（ADCRNN）。此外，我们选择三维对数Mel谱图（MelSpec）特征作为深度学习框架的输入。此外，我们通过组合softmax损失和中心损失来重新配置损失函数，以对情绪进行分类。为了验证我们提出的方法，我们使用EmoDB数据集，该数据集由多个不平衡样本的情绪组成。实验结果表明，所提出的方法在EmoDB上的准确率比现有方法高，传统方法和基于GAN的方法的准确率分别为87.12%和88.47%。
摘要：Speech emotion recognition (SER) has been one of the significant tasks in Human-Computer Interaction (HCI) applications. However, it is hard to choose the optimal features and deal with imbalance labeled data. In this article, we investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods. To evaluate the effectiveness of HDA methods, a deep learning framework namely (ADCRNN) is designed by integrating deep dilated convolutional-recurrent neural networks with an attention mechanism. Besides, we choose 3D log Mel-spectrogram (MelSpec) features as the inputs for the deep learning framework. Furthermore, we reconfigure a loss function by combining a softmax loss and a center loss to classify the emotions. For validating our proposed methods, we use the EmoDB dataset that consists of several emotions with imbalanced samples. Experimental results prove that the proposed methods achieve better accuracy than the state-of-the-art methods on the EmoDB with 87.12% and 88.47% for the traditional and GAN-based methods, respectively.

【5】 Multimodal Classification: Current Landscape, Taxonomy and Future Directions
标题：多模态分类：现状、分类与未来发展方向
链接：https://arxiv.org/abs/2109.09020

作者：William C. Sleeman IV,Rishabh Kapoor,Preetam Ghosh
机构： Virginia Commonwealth University
备注：24 pages, 3 tables, 7 figures
摘要：多模式分类研究在许多领域越来越受欢迎，这些领域从多个来源收集更多数据，包括卫星图像、生物特征和医学。然而，由于缺乏一致的术语和体系结构描述，很难比较不同的现有解决方案。我们通过提出一种新的分类法来解决这些挑战，该分类法基于最近关于多模式分类的出版物中发现的趋势来描述此类系统。对于多模态数据集，单峰分类的许多最困难的方面尚未完全解决，包括大数据、类不平衡和实例级困难。我们还讨论了这些挑战和未来方向。
摘要：Multimodal classification research has been gaining popularity in many domains that collect more data from multiple sources including satellite imagery, biometrics, and medicine. However, the lack of consistent terminology and architectural descriptions makes it difficult to compare different existing solutions. We address these challenges by proposing a new taxonomy for describing such systems based on trends found in recent publications on multimodal classification. Many of the most difficult aspects of unimodal classification have not yet been fully addressed for multimodal datasets including big data, class imbalance, and instance level difficulty. We also provide a discussion of these challenges and future directions.

【6】 Ensemble Learning using Error Correcting Output Codes: New Classification Error Bounds
标题：使用纠错输出码的集成学习：新的分类误差界
链接：https://arxiv.org/abs/2109.08967

作者：Hieu D. Nguyen,Mohammed Sarosh Khan,Nicholas Kaegi,Shen-Shyang Ho,Jonathan Moore,Logan Borys,Lucas Lavalva
机构：∗Department of Mathematics, Rowan University, Glassboro, NJ USA., †Department of Computer Science, Rowan University, Glassboro, NJ USA.
备注：14 pages, 11 figures
摘要：给出了机器学习中纠错输出码（ECOC）方法分类错误率的新界。这些边界相对于码字长度具有指数衰减复杂性，从理论上验证了ECOC方法的有效性。推导了两种不同模型的边界：第一种模型假设所有基本分类器都是独立的，第二种模型假设所有基本分类器相互关联到一阶。此外，我们对六个数据集进行了ECOC分类，并将它们的错误率与我们的界限进行了比较，以实验验证我们的工作，并显示相关性对分类精度的影响。
摘要：New bounds on classification error rates for the error-correcting output code (ECOC) approach in machine learning are presented. These bounds have exponential decay complexity with respect to codeword length and theoretically validate the effectiveness of the ECOC approach. Bounds are derived for two different models: the first under the assumption that all base classifiers are independent and the second under the assumption that all base classifiers are mutually correlated up to first-order. Moreover, we perform ECOC classification on six datasets and compare their error rates with our bounds to experimentally validate our work and show the effect of correlation on classification accuracy.

【7】 Data Augmentation Through Monte Carlo Arithmetic Leads to More Generalizable Classification in Connectomics
标题：通过蒙特卡罗算法进行数据增强使连接学中的分类更具概括性
链接：https://arxiv.org/abs/2109.09649

作者：Gregory Kiar,Yohan Chatelain,Ali Salari,Alan C. Evans,Tristan Glatard
机构：|, Ali, Salari, Montreal Neurological Institute, McGill, University, Montreal, QC, H,A ,B, Canada, Department of Computer Science and, Computer Engineering, Concordia, University, Montreal, QC, H,G ,M, Correspondence, Developing Brain, Child Mind Institute, Present address
摘要：机器学习模型通常应用于人脑成像数据集，试图将功能或结构与行为、健康或其他个体表型联系起来。此类模型通常依赖于复杂处理管道生成的低维映射。然而，管道固有的数值不稳定性限制了这些贴图的保真度，并引入了计算偏差。蒙特卡罗算法是一种引入受控数量数值噪声的技术，用于干扰结构连接组估计管道，最终为每个样本生成一系列合理的网络。扰动网络的可变性被捕获到一个扩充数据集中，然后用于年龄分类任务。我们发现，在一系列数字扰动结果中对大脑网络进行重采样，可以提高所有测试分类器、预处理策略和降维技术的性能。重要的是，我们发现这一好处并不取决于大量扰动，这表明即使对数据集进行最小扰动也会增加有意义的方差，这可以在随后设计的模型中捕捉到。
摘要：Machine learning models are commonly applied to human brain imaging datasets in an effort to associate function or structure with behaviour, health, or other individual phenotypes. Such models often rely on low-dimensional maps generated by complex processing pipelines. However, the numerical instabilities inherent to pipelines limit the fidelity of these maps and introduce computational bias. Monte Carlo Arithmetic, a technique for introducing controlled amounts of numerical noise, was used to perturb a structural connectome estimation pipeline, ultimately producing a range of plausible networks for each sample. The variability in the perturbed networks was captured in an augmented dataset, which was then used for an age classification task. We found that resampling brain networks across a series of such numerically perturbed outcomes led to improved performance in all tested classifiers, preprocessing strategies, and dimensionality reduction techniques. Importantly, we find that this benefit does not hinge on a large number of perturbations, suggesting that even minimally perturbing a dataset adds meaningful variance which can be captured in the subsequently designed models.

【8】 Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
标题：具有编码器选择的联合近、远距离语音识别的双编码器结构
链接：https://arxiv.org/abs/2109.08744

作者：Felix Weninger,Marco Gaudesi,Ralf Leibold,Roberto Gemello,Puming Zhan
机构：Nuance Communications, Burlington, MA, USA, Nuance Communications, Torino, Italy, Nuance Communications, Aachen, Germany
备注：To appear in ASRU 2021
摘要：在本文中，我们提出了一种用于近距离通话（CT）和远距离通话（FT）语音联合建模的双编码器ASR体系结构，以便结合CT和FT设备的优点以获得更好的准确性。其关键思想是添加编码器选择网络，以选择最佳输入源（CT或FT）和相应的编码器。我们对CT语音使用单通道编码器，对FT语音使用带空间滤波神经波束形成的多通道编码器，这两种编码器与编码器选择联合训练。我们在基于注意和RNN传感器端到端ASR系统上验证了我们的方法。实验是用一个医疗用例中的会话语音进行的，该会话语音是用CT设备和麦克风阵列同时记录的。我们的结果表明，与在匹配条件下训练和测试的最佳单编码器系统相比，当同时使用CT和FT输入时，所提出的双编码器结构可获得高达9%的相对功耗降低。
摘要：In this paper, we propose a dual-encoder ASR architecture for joint modeling of close-talk (CT) and far-talk (FT) speech, in order to combine the advantages of CT and FT devices for better accuracy. The key idea is to add an encoder selection network to choose the optimal input source (CT or FT) and the corresponding encoder. We use a single-channel encoder for CT speech and a multi-channel encoder with Spatial Filtering neural beamforming for FT speech, which are jointly trained with the encoder selection. We validate our approach on both attention-based and RNN Transducer end-to-end ASR systems. The experiments are done with conversational speech from a medical use case, which is recorded simultaneously with a CT device and a microphone array. Our results show that the proposed dual-encoder architecture obtains up to 9% relative WER reduction when using both CT and FT input, compared to the best single-encoder system trained and tested in matched condition.

表征(2篇)

【1】 Towards Representation Learning for Atmospheric Dynamics
标题：浅谈大气动力学的表征学习
链接：https://arxiv.org/abs/2109.09076

作者：Sebastian Hoffmann,Christian Lessig
机构：Dept. of Computer Science, Universität Magdeburg
摘要：在人为强迫下预测未来气候情景对于理解气候变化和评估潜在反作用技术的影响至关重要。这种预测的机器学习和混合技术依赖于对相关但往往微妙的影响敏感的信息度量。对于气候系统的关键部分大气动力学而言，“眼球测量”，即专家的目视检查，目前仍然是金标准。然而，在需要算法描述的机器学习系统中，它不能用作度量。基于中间神经网络激活作为学习指标基础的成功，例如在计算机视觉中，我们提出了一种专为大气动力学设计的新型自监督表示学习方法。我们的方法称为AtmoDist，在一项简单的辅助任务上训练神经网络：预测混乱的大气场序列元素之间的时间距离（例如，再分析或模拟得到的风场分量）。该任务迫使网络学习数据的重要内在方面，作为其层中的激活，因此可以从中获得判别度量。我们通过使用AtmoDist定义基于GAN的涡度和散度超分辨率的度量来证明这一点。我们的放大数据与高分辨率参考的真实统计数据非常匹配，并且显著优于基于均方误差的最新技术。由于AtmoDist是无监督的，只需要一个时间序列的字段，并使用一个简单的辅助任务，因此它可以用于旨在理解和缓解气候变化的广泛应用。
摘要：The prediction of future climate scenarios under anthropogenic forcing is critical to understand climate change and to assess the impact of potentially counter-acting technologies. Machine learning and hybrid techniques for this prediction rely on informative metrics that are sensitive to pertinent but often subtle influences. For atmospheric dynamics, a critical part of the climate system, the "eyeball metric", i.e. a visual inspection by an expert, is currently still the gold standard. However, it cannot be used as metric in machine learning systems where an algorithmic description is required. Motivated by the success of intermediate neural network activations as basis for learned metrics, e.g. in computer vision, we present a novel, self-supervised representation learning approach specifically designed for atmospheric dynamics. Our approach, called AtmoDist, trains a neural network on a simple, auxiliary task: predicting the temporal distance between elements of a shuffled sequence of atmospheric fields (e.g. the components of the wind field from a reanalysis or simulation). The task forces the network to learn important intrinsic aspects of the data as activations in its layers and from these hence a discriminative metric can be obtained. We demonstrate this by using AtmoDist to define a metric for GAN-based super resolution of vorticity and divergence. Our upscaled data matches closely the true statistics of a high resolution reference and it significantly outperform the state-of-the-art based on mean squared error. Since AtmoDist is unsupervised, only requires a temporal sequence of fields, and uses a simple auxiliary task, it can be used in a wide range of applications that aim to understand and mitigate climate change.

【2】 Interest-oriented Universal User Representation via Contrastive Learning
标题：基于对比学习的面向兴趣的通用用户表示
链接：https://arxiv.org/abs/2109.08865

作者：Qinghui Sun,Jie Gu,Bei Yang,XiaoXiao Xu,Renjun Xu,Shangde Gao,Hong Liu,Huan Xu
机构：Alibaba Group, HangZhou, China
备注：8 pages, during peer review
摘要：用户代表对于在行业中提供高质量的商业服务至关重要。通用用户表示最近受到了许多关注，通过它，我们可以摆脱为每个下游应用程序训练特定模型的繁琐工作。在本文中，我们试图从两个角度改进通用用户表示。首先，提出了一种对比的自监督学习范式来指导表征模型的训练。它提供了一个统一的框架，允许以数据驱动的方式进行长期或短期兴趣表示学习。此外，还提出了一种新的多兴趣提取模块。该模块引入一个兴趣字典来捕获给定用户的主要兴趣，然后通过行为聚合生成他/她的兴趣表示。实验结果证明了所学习的用户表示的有效性和适用性。
摘要：User representation is essential for providing high-quality commercial services in industry. Universal user representation has received many interests recently, with which we can be free from the cumbersome work of training a specific model for each downstream application. In this paper, we attempt to improve universal user representation from two points of views. First, a contrastive self-supervised learning paradigm is presented to guide the representation model training. It provides a unified framework that allows for long-term or short-term interest representation learning in a data-driven manner. Moreover, a novel multi-interest extraction module is presented. The module introduces an interest dictionary to capture principal interests of the given user, and then generate his/her interest-oriented representations via behavior aggregation. Experimental results demonstrate the effectiveness and applicability of the learned user representations.

3D|3D重建等相关(1篇)

【1】 Automatic 3D Ultrasound Segmentation of Uterus Using Deep Learning
标题：基于深度学习的子宫三维超声自动分割
链接：https://arxiv.org/abs/2109.09283

作者：Bahareh Behboodi,Hassan Rivaz,Susan Lalondrelle,Emma Harris
机构：Department of Electrical and, Computer Eng, Concordia University, Montreal, Canada, Institute of Cancer Research, London, United Kingdom
摘要：子宫的在线分割有助于有效的基于图像的指导，以便在宫颈癌放疗期间精确地将剂量传递到靶组织（子宫颈）。三维超声（US）可用于子宫成像，然而，由于子宫每日位置和形状变化大，膀胱充盈变化大，以及3D US图像的局限性，如高度方向分辨率低和成像畸变，因此在US图像中找到子宫边界的位置是一项具有挑战性的任务。以前关于子宫分割的研究主要集中于开发半自动算法，其中需要由专家临床医生进行手动初始化。由于对自动三维子宫分割的研究有限，当前研究的目的是利用最新的基于深度学习的算法克服半自动算法中手动初始化的需要。因此，我们开发了基于二维UNet的网络，该网络基于两种场景进行训练。在第一个场景中，我们分别在每个平面（即矢状面、冠状面、轴向）上训练3个不同的网络。在第二个场景中，我们提出的网络使用每个3D体积的所有平面进行训练。我们提出的原理图可以克服以前半自动算法的初始手动选择。
摘要：On-line segmentation of the uterus can aid effective image-based guidance for precise delivery of dose to the target tissue (the uterocervix) during cervix cancer radiotherapy. 3D ultrasound (US) can be used to image the uterus, however, finding the position of uterine boundary in US images is a challenging task due to large daily positional and shape changes in the uterus, large variation in bladder filling, and the limitations of 3D US images such as low resolution in the elevational direction and imaging aberrations. Previous studies on uterus segmentation mainly focused on developing semi-automatic algorithms where require manual initialization to be done by an expert clinician. Due to limited studies on the automatic 3D uterus segmentation, the aim of the current study was to overcome the need for manual initialization in the semi-automatic algorithms using the recent deep learning-based algorithms. Therefore, we developed 2D UNet-based networks that are trained based on two scenarios. In the first scenario, we trained 3 different networks on each plane (i.e., sagittal, coronal, axial) individually. In the second scenario, our proposed network was trained using all the planes of each 3D volume. Our proposed schematic can overcome the initial manual selection of previous semi-automatic algorithm.

优化|敛散性(9篇)

【1】 On the Convergence of Tsetlin Machines for the AND and the OR Operators
标题：关于AND和OR算子的Tsetlin机的收敛性
链接：https://arxiv.org/abs/2109.09488

作者：Lei Jiao,Xuan Zhang,Ole-Christoffer Granmo
备注：arXiv admin note: text overlap with arXiv:2101.02547
摘要：Tsetlin机器（TM）是一种基于命题逻辑的新型机器学习算法，在一些模式识别问题上取得了最新的性能。在以往的研究中，已经分析了TM在1位运算和异或运算中的收敛性。为了完成对基本数字运算的分析，本文分析了输入训练样本分别跟随AND和OR算子时的收敛性。我们的分析表明，TM几乎可以肯定地收敛到复制AND和OR运算符，这是从无限时间范围内的训练数据中学习到的。对AND和OR运算符的分析，以及之前分析的1位和XOR运算，完成了布尔代数中基本运算符的收敛性分析。
摘要：The Tsetlin Machine (TM) is a novel machine-learning algorithm based on propositional logic, which has obtained state-of-the-art performance on several pattern recognition problems. In previous studies, the convergence properties of TM for 1-bit operation and XOR operation have been analyzed. To make the analyses for the basic digital operations complete, in this article, we analyze the convergence when input training samples follow AND and OR operators respectively. Our analyses reveal that the TM can converge almost surely to reproduce AND and OR operators, which are learnt from training data over an infinite time horizon. The analyses on AND and OR operators, together with the previously analysed 1-bit and XOR operations, complete the convergence analyses on basic operators in Boolean algebra.

【2】 Asymptotic Optimality for Decentralised Bandits
标题：分散强流的渐近最优性
链接：https://arxiv.org/abs/2109.09427

作者：Conor Newton,Ayalvadi Ganesh,Henry W. J. Reeve
机构：School of Mathematics, University of Bristol
摘要：我们认为，大量的代理协作多臂强盗问题与大量的武器。其目标是在通信受限的环境中最小化每个代理的遗憾。我们提出了一种分散算法，该算法建立在Chawla等人的八卦插入消除方法的基础上，并对其进行了改进。arxiv:2001.05452。我们对产生的遗憾进行了理论分析，结果表明我们的算法是渐近最优的。事实上，我们的遗憾保证匹配在完全通信设置中可实现的渐近最优速率。最后，我们给出了支持我们结论的实证结果
摘要：We consider a large number of agents collaborating on a multi-armed bandit problem with a large number of arms. The goal is to minimise the regret of each agent in a communication-constrained setting. We present a decentralised algorithm which builds upon and improves the Gossip-Insert-Eliminate method of Chawla et al. arxiv:2001.05452. We provide a theoretical analysis of the regret incurred which shows that our algorithm is asymptotically optimal. In fact, our regret guarantee matches the asymptotically optimal rate achievable in the full communication setting. Finally, we present empirical results which support our conclusions

【3】 Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection
标题：基于变量选择的高效高维贝叶斯优化算法
链接：https://arxiv.org/abs/2109.09264

作者：Yihang Shen,Carl Kingsford
机构：Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA
摘要：贝叶斯优化（BO）是一种全局优化黑盒函数的方法。虽然BO已成功应用于许多场景，但开发可扩展到高维域函数的有效BO算法仍然是一项挑战。通过vanilla BO优化这些函数非常耗时。基于将高维空间嵌入到低维空间的思想的高维BO替代策略对嵌入维度的选择非常敏感，这需要预先指定。我们开发了一种新的计算效率高的高维BO方法，利用变量选择。我们的方法能够自动学习轴对齐的子空间，即包含选定变量的空间，而不需要任何预先指定的超参数。我们从理论上分析了算法的计算复杂度，并推导了遗憾界。我们通过实验证明了我们的方法在几个综合问题和实际问题上的有效性。
摘要：Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, developing effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of embedding the high-dimensional space to the one with low dimension are sensitive to the choice of the embedding dimension, which needs to be pre-specified. We develop a new computationally efficient high-dimensional BO method that exploits variable selection. Our method is able to automatically learn axis-aligned sub-spaces, i.e. spaces containing selected variables, without the demand of any pre-specified hyperparameters. We theoretically analyze the computational complexity of our algorithm and derive the regret bound. We empirically show the efficacy of our method on several synthetic and real problems.

【4】 An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm for First-order and Zeroth-order Optimization
标题：一阶和零阶优化的加速减方差条件梯度滑动算法
链接：https://arxiv.org/abs/2109.08858

作者：Xiyuan Wei,Bin Gu,Heng Huang
机构：School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, Jiangsu, China, MBZUAI, United Arab Emirates, JD Finance America Corporation, Department of Electrical and Computer Engineering, University of Pittsburgh
备注：44 pages, 22 figures
摘要：条件梯度算法（也称为Frank-Wolfe算法）由于其解决约束问题的无投影特性，最近在机器学习界重新流行起来。虽然已经提出了许多条件梯度算法的变体来提高性能，但它们依赖于一阶信息（梯度）进行优化。自然地，这些算法无法在日益流行的零阶优化领域正常工作，在该领域中只有零阶信息（函数值）可用。为了填补这一空白，我们提出了一种新的有限和问题的加速方差减少条件梯度滑动（ARCS）算法，该算法可以使用一阶或零阶信息进行优化。据我们所知，ARCS是第一个解决零阶优化凸问题的零阶条件梯度滑动型算法。在一阶优化中，ARC的收敛结果在梯度查询次数方面明显优于以前的算法。最后，我们通过在真实数据集上的实验验证了ARCS的优越性。
摘要：The conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community due to its projection-free property to solve constrained problems. Although many variants of the conditional gradient algorithm have been proposed to improve performance, they depend on first-order information (gradient) to optimize. Naturally, these algorithms are unable to function properly in the field of increasingly popular zeroth-order optimization, where only zeroth-order information (function value) is available. To fill in this gap, we propose a novel Accelerated variance-Reduced Conditional gradient Sliding (ARCS) algorithm for finite-sum problems, which can use either first-order or zeroth-order information to optimize. To the best of our knowledge, ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization. In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle. Finally we validated the superiority of ARCS by experiments on real-world datasets.

【5】 A Deep-Learning Based Optimization Approach to Address Stop-Skipping Strategy in Urban Rail Transit Lines
标题：基于深度学习的城市轨道交通线路越站策略优化方法
链接：https://arxiv.org/abs/2109.08786

作者：Mohammadjavad Javadinasr,Amir Bahador Parsa,Abolfazl,Mohammadian
机构：) Department of Civil and Materials Engineering, University of Illinois at Chicago, W. Taylor St, Chicago, IL
备注：23 pages, 6 figures
摘要：公交车站不同的乘客需求率强调了采用运营策略提供需求响应服务的重要性。为了提高乘客的出行时间，本研究引入了一种先进的数据驱动优化方法来确定城市轨道交通线路的最优停跳模式。具体而言，首先，使用整个月的时间序列智能卡数据，我们采用长短时记忆（LSTM）深度学习模型来预测高峰时段的车站级需求率。该预测基于前四个小时，尤其重要的是要知道高峰小时的真实需求率是只有在高峰小时运行完成后才能获得的后验信息。此外，利用实时预测而不是假设固定需求率，使我们能够考虑可能对后续分析不利的意外实时变化。然后，我们将LSTM模型的输出作为优化模型的输入，以最小化用户的总出行时间为目标。考虑到问题的指数性质，我们提出了一种蚁群优化技术来在适当的时间内解决问题。最后，使用实际案例数据对所提出的模型和求解算法的性能进行了评估。结果表明，所提出的方法可以通过改善乘客的车内时间和乘客的等待时间来提高服务的性能。
摘要：Different passenger demand rates in transit stations underscore the importance of adopting operational strategies to provide a demand-responsive service. Aiming at improving passengers' travel time, the present study introduces an advanced data-driven optimization approach to determine the optimal stop-skip pattern in urban rail transit lines. In detail, first, using the time-series smart card data for an entire month, we employ a Long Short-Term Memory (LSTM) deep learning model to predict the station-level demand rates for the peak hour. This prediction is based on four preceding hours and is especially important knowing that the true demand rates of the peak hour are posterior information that can be obtained only after the peak hour operation is finished. Moreover, utilizing a real-time prediction instead of assuming fixed demand rates, allows us to account for unexpected real-time changes which can be detrimental to the subsequent analyses. Then, we integrate the output of the LSTM model as an input to an optimization model with the objective of minimizing patrons' total travel time. Considering the exponential nature of the problem, we propose an Ant Colony Optimization technique to solve the problem in a desirable amount of time. Finally, the performance of the proposed models and the solution algorithm is assessed using real case data. The results suggest that the proposed approach can enhance the performance of the service by improving both passengers' in-vehicle time as well as passengers' waiting time.

【6】 The Optimization of the Constant Flow Parallel Micropump Using RBF Neural Network
标题：基于RBF神经网络的恒流量并联微泵优化设计
链接：https://arxiv.org/abs/2109.08717

作者：Chenyang Ma,Boyuan Xu
机构：University of Michigan, Department of EECS, Michigan, U.S., Shanghai Wufeng Scientific Instrument Co., Ltd., R&D Department, Shanghai, China
摘要：本工作的目的是优化具有平行泵室且包含被动止回阀的恒流平行机械排量微型泵的性能。关键任务是在左泵和右泵交换吸入和输液作用的往复运动过程中，最大限度地减少因反流引起的压力脉冲，这会对恒定流速产生负面影响。以前的工作试图通过被动止回阀的机械设计来解决这个问题。在这项工作中，提出了重叠时间的新概念，并通过实现一个由无监督学习和有监督学习共同训练的RBF神经网络，从控制理论的角度解决了这个问题。实验结果表明，压力脉冲在0.15～0.25mpa范围内得到了优化，与泵的最大工作压力40mpa相比有了显著的提高。
摘要：The objective of this work is to optimize the performance of a constant flow parallel mechanical displacement micropump, which has parallel pump chambers and incorporates passive check valves. The critical task is to minimize the pressure pulse caused by regurgitation, which negatively impacts the constant flow rate, during the reciprocating motion when the left and right pumps interchange their role of aspiration and transfusion. Previous works attempt to solve this issue via the mechanical design of passive check valves. In this work, the novel concept of overlap time is proposed, and the issue is solved from the aspect of control theory by implementing a RBF neural network trained by both unsupervised and supervised learning. The experimental results indicate that the pressure pulse is optimized in the range of 0.15 - 0.25 MPa, which is a significant improvement compared to the maximum pump working pressure of 40 MPa.

【7】 A Data-Driven Convergence Bidding Strategy Based on Reverse Engineering of Market Participants' Performance: A Case of California ISO
标题：基于市场参与者绩效逆向工程的数据驱动收敛竞价策略--以加州ISO为例
链接：https://arxiv.org/abs/2109.09238

作者：Ehsan Samani,Mahdi Kohansal,Hamed Mohsenian-Rad
备注：IEEE Transactions on Power Systems
摘要：近几年来，集中竞价，又称虚拟竞价，在批发电力市场上得到了广泛的应用。它为市场参与者提供了利用日前市场位置边际价格和实时市场位置边际价格之间的差异进行套利的机会。鉴于融合投标（CBs）对电力市场的运行有着重大影响，了解市场参与者如何在现实世界中战略性地选择其CBs非常重要。我们将重点放在加利福尼亚ISO运营的电力市场上，解决这一公开问题。在这方面，我们使用公开的电力市场数据来学习、描述和评估市场参与者目前使用的不同类型的融合竞价策略。我们的分析包括开发一种数据驱动的逆向工程方法，该方法适用于三年的真实数据。我们的分析涉及特征选择和基于密度的数据聚类。它的结果是确定了加州ISO市场的三大CB策略集群。分析了每一类策略的不同特点和性能。有趣的是，我们揭示了一种常见的现实世界策略，该策略与文献中任何现有的策略趋同投标方法都不匹配。接下来，我们根据从现有现实世界战略中吸取的经验教训，提出一个新的CB战略，该战略可以显著优于这些战略。我们的分析包括制定新的融合投标策略。新策略分为三个步骤：通过捕获价格峰值实现净利润最大化、动态节点标记和策略选择算法。我们通过案例研究表明，如果采用所提出的趋同投标策略，最有利可图的市场参与者的年净利润可以增加40%以上。
摘要：Convergence bidding, a.k.a., virtual bidding, has been widely adopted in wholesale electricity markets in recent years. It provides opportunities for market participants to arbitrage on the difference between the day-ahead market locational marginal prices and the real-time market locational marginal prices. Given the fact that convergence bids (CBs) have a significant impact on the operation of electricity markets, it is important to understand how market participants strategically select their CBs in real-world. We address this open problem with focus on the electricity market that is operated by the California ISO. In this regard, we use the publicly available electricity market data to learn, characterize, and evaluate different types of convergence bidding strategies that are currently used by market participants. Our analysis includes developing a data-driven reverse engineering method that we apply to three years of real-world data. Our analysis involves feature selection and density-based data clustering. It results in identifying three main clusters of CB strategies in the California ISO market. Different characteristics and the performance of each cluster of strategies are analyzed. Interestingly, we unmask a common real-world strategy that does not match any of the existing strategic convergence bidding methods in the literature. Next, we build upon the lessons learned from the existing real-world strategies to propose a new CB strategy that can significantly outperform them. Our analysis includes developing a new strategy for convergence bidding. The new strategy has three steps: net profit maximization by capturing price spikes, dynamic node labeling, and strategy selection algorithm. We show through case studies that the annual net profit for the most lucrative market participants can increase by over 40% if the proposed convergence bidding strategy is used.

【8】 Topology, Convergence, and Reconstruction of Predictive States
标题：预测态的拓扑、收敛与重构
链接：https://arxiv.org/abs/2109.09203

作者：Samuel P. Loomis,James P. Crutchfield
机构：Complexity Sciences Center and Department of Physics and Astronomy, University of California at Davis, One Shields Avenue, Davis, CA , )
备注：16 pages, 4 figures; this http URL
摘要：离散随机过程中的预测等价已成功地应用于统计物理和混沌动力学系统中的随机性和结构识别以及隐马尔可夫模型的推断。我们研究了从时间序列数据可靠地重建预测状态的条件，表明在弱拓扑度量下，从经验样本可以实现预测状态的收敛。此外，预测状态可以在复制弱拓扑的希尔BERT空间中表示。我们从数学上解释了这些表示在重建高内存进程时是如何特别有用的，并将它们连接到再生的核希尔BERT空间。
摘要：Predictive equivalence in discrete stochastic processes have been applied with great success to identify randomness and structure in statistical physics and chaotic dynamical systems and to inferring hidden Markov models. We examine the conditions under which they can be reliably reconstructed from time-series data, showing that convergence of predictive states can be achieved from empirical samples in the weak topology of measures. Moreover, predictive states may be represented in Hilbert spaces that replicate the weak topology. We mathematically explain how these representations are particularly beneficial when reconstructing high-memory processes and connect them to reproducing kernel Hilbert spaces.

【9】 Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks
标题：浅层RELU神经网络的近极大最优估计
链接：https://arxiv.org/abs/2109.08844

作者：Rahul Parhi,Robert D. Nowak
机构：The authors are with the Department of Electrical and Computer En-gineering, University of Wisconsin–Madison
摘要：我们研究了利用浅层（单隐层）ReLU神经网络从噪声数据中估计未知函数的问题。我们研究的估计器最小化数据拟合误差平方和加上与网络权重欧氏范数成比例的正则化项。这种最小化对应于训练具有权重衰减的神经网络的常用方法。当数据生成函数属于Radon域中的二阶有界变差函数空间时，我们量化了这些神经网络估计器的性能（均方误差）。该函数空间最近被提出作为与浅层ReLU神经网络相关联的自然函数空间。我们推导了该函数空间的估计问题的极大极小下界，并证明了在对数因子下神经网络估计是极大极小最优的。我们还证明了这是一个包含经典多元函数空间的“混合变分”函数空间，包括某些Sobolev空间和某些谱Barron空间。最后，我们使用这些结果来量化神经网络和线性方法（包括核方法）之间的差距。本文揭示了神经网络似乎打破了维度诅咒的现象。
摘要：We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. We also show that this is a "mixed variation" function space that contains classical multivariate function spaces including certain Sobolev spaces and certain spectral Barron spaces. Finally, we use these results to quantify a gap between neural networks and linear methods (which include kernel methods). This paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.

预测|估计(7篇)

【1】 Neural forecasting at scale
标题：尺度上的神经预测
链接：https://arxiv.org/abs/2109.09705

作者：Philippe Chatigny,Boris N. Oreshkin,Jean-Marc Patenaude and,Shengrui Wang
机构：Wanga, University of Sherbrooke, Sherbrooke, QC, Canada, Unity Technologies, Labs, Montreal, QC, Canada, Laplace Insights, Sherbrooke, QC, Canada
摘要：我们研究了基于深度神经网络的大规模时间序列（TS）预测的有效尺度集成问题。当前最先进的深度集成模型具有很高的内存和计算要求，妨碍了它们在实际场景中预测数以百万计的TS。我们提出了N-BEATS（P），这是N-BEATS模型的一个全局多变量变量变量，用于同时训练多个单变量TS预测模型。我们的模型解决了相关模型的实际局限性，将训练时间减少了一半，内存需求减少了5倍，同时保持了相同的精度水平。我们进行了多次实验，详细说明了训练模型的各种方法，并获得了证明其支持零炮TS预测的能力的结果，即在源TS数据集上训练神经网络，并在不同的目标TS数据集上部署它，而无需再训练，它提供了一种高效可靠的解决方案，即使在困难的预测条件下也能进行大规模预测。
摘要：We study the problem of efficiently scaling ensemble-based deep neural networks for time series (TS) forecasting on a large set of time series. Current state-of-the-art deep ensemble models have high memory and computational requirements, hampering their use to forecast millions of TS in practical scenarios. We propose N-BEATS(P), a global multivariate variant of the N-BEATS model designed to allow simultaneous training of multiple univariate TS forecasting models. Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5, while keeping the same level of accuracy. We have performed multiple experiments detailing the various ways to train our model and have obtained results that demonstrate its capacity to support zero-shot TS forecasting, i.e., to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, which provides an efficient and reliable solution to forecast at scale even in difficult forecasting conditions.

【2】 Predictive Quality of Service (PQoS): The Next Frontier for Fully Autonomous Systems
标题：预测服务质量(PQoS)：全自主系统的下一个前沿
链接：https://arxiv.org/abs/2109.09376

作者：Mate Boban,Marco Giordani,Michele Zorzi
机构： Marco Giordaniand Michele Zorzi are with the Department of Information Engineering (DEI), University of Padova
备注：This paper has been accepted for publication in IEEE Network, 2021. 7 pages, 5 figures, 1 table
摘要：软件、硬件、计算和控制方面的最新进展推动了自主系统领域的重大进展。值得注意的是，自主机器应该持续地估计它们移动和操作的场景在预定义的时间框架内如何发展，并预测网络是否能够实现约定的服务质量（QoS）。如果没有，应采取适当的对策以满足应用要求。沿着这些思路，在本文中，我们提出了在自治系统中实现预测QoS（PQO）的可能方法，并讨论了哪些用例将特别受益于网络预测。然后，我们阐明了该领域仍有待于未来研究的挑战。作为一个案例研究，我们展示了机器学习是否能够促进类似遥控驾驶的用例中的PQO，作为不同测量信号的函数。
摘要：Recent advances in software, hardware, computing and control have fueled significant progress in the field of autonomous systems. Notably, autonomous machines should continuously estimate how the scenario in which they move and operate will evolve within a predefined time frame, and foresee whether or not the network will be able to fulfill the agreed Quality of Service (QoS). If not, appropriate countermeasures should be taken to satisfy the application requirements. Along these lines, in this paper we present possible methods to enable predictive QoS (PQoS) in autonomous systems, and discuss which use cases will particularly benefit from network prediction. Then, we shed light on the challenges in the field that are still open for future research. As a case study, we demonstrate whether machine learning can facilitate PQoS in a teleoperated-driving-like use case, as a function of different measurement signals.

【3】 Rethnicity: Predicting Ethnicity from Names
标题：真实性：从名字中预测种族
链接：https://arxiv.org/abs/2109.09228

作者：Fangzhou Xie
机构：Department of Economics, Rutgers University
摘要：我提供了一个R包，\texttt{rethnicity}，用于根据姓名预测种族。我使用双向LSTM作为模型，佛罗里达州选民登记作为训练数据。通过调整数据集中的不平衡性，特别注意少数群体的准确性。我还将可用性、准确性和性能与其他通过姓名预测种族的解决方案进行了比较。DIME数据集的示例代码片段和分析也显示为该包的应用程序。
摘要：I provide an R package, \texttt{rethnicity}, for predicting ethnicity from names. I use the Bidirectional LSTM as the model and Florida Voter Registration as training data. Special care is given for the accuracy of minority groups, by adjusting the imbalance in the dataset. I also compare the availability, accuracy, and performance with other solutions for predicting ethnicity from names. Sample code snippet and analysis of the DIME dataset are also shown as applications of the package.

【4】 Harnessing the Power of Ego Network Layers for Link Prediction in Online Social Networks
标题：利用EGO网络层的力量进行在线社交网络中的链接预测
链接：https://arxiv.org/abs/2109.09190

作者：Mustafa Toprak,Chiara Boldrini,Andrea Passarella,Marco Conti
备注：This work was partially funded by the following projects. European Union's Horizon 2020 research and innovation programme: SoBigData++ (No 871042), HumaneAI-Net (No 952026), MARVEL (No 957337). Italian PON-MISE program: OK-INSAID project (No ARS01 00917). CHIST-ERA program: SAI project (grant CHIST-ERA-19-XAI-010, funded by MUR, grant number not yet available)
摘要：能够推荐在线社交网络中用户之间的链接对于用户与志同道合的个人以及平台本身和利用社交媒体信息发展业务的第三方来说都很重要。预测通常基于无监督或有监督学习，通常利用简单但有效的图形拓扑信息，如公共邻居的数量。然而，我们认为，更丰富的个人社会结构信息可能会导致更好的预测。在本文中，我们建议利用成熟的社会认知理论来提高链接预测性能。根据这些理论，个人平均沿着五个亲密度降低的同心圆来安排他们的社会关系。我们假设，在预测新的联系时，不同圈子中的关系具有不同的重要性。为了验证这一说法，我们重点研究了流行的特征提取预测算法（无监督和有监督），并将其扩展到社交圈。我们利用由视频游戏玩家和普通用户组成的两个Twitter数据集，根据几个基准（包括它们的基线版本以及基于节点嵌入和GNN的链接预测）验证了这些圈感知算法的预测性能。我们表明，社会意识通常能显著提高预测性能，同时也优于node2vec和SEAL等最先进的解决方案，并且不会增加计算复杂性。最后，我们展示了社会意识可以用来代替使用分类器（可能代价高昂或不切实际）来针对特定类别的用户。
摘要：Being able to recommend links between users in online social networks is important for users to connect with like-minded individuals as well as for the platforms themselves and third parties leveraging social media information to grow their business. Predictions are typically based on unsupervised or supervised learning, often leveraging simple yet effective graph topological information, such as the number of common neighbors. However, we argue that richer information about personal social structure of individuals might lead to better predictions. In this paper, we propose to leverage well-established social cognitive theories to improve link prediction performance. According to these theories, individuals arrange their social relationships along, on average, five concentric circles of decreasing intimacy. We postulate that relationships in different circles have different importance in predicting new links. In order to validate this claim, we focus on popular feature-extraction prediction algorithms (both unsupervised and supervised) and we extend them to include social-circles awareness. We validate the prediction performance of these circle-aware algorithms against several benchmarks (including their baseline versions as well as node-embedding- and GNN-based link prediction), leveraging two Twitter datasets comprising a community of video gamers and generic users. We show that social-awareness generally provides significant improvements in the prediction performance, beating also state-of-the-art solutions like node2vec and SEAL, and without increasing the computational complexity. Finally, we show that social-awareness can be used in place of using a classifier (which may be costly or impractical) for targeting a specific category of users.

【5】 Capacitance Resistance Model and Recurrent Neural Network for Well Connectivity Estimation : A Comparison Study
标题：电容电阻模型与递归神经网络在油井连通性估算中的对比研究
链接：https://arxiv.org/abs/2109.08779

作者：Deepthi Sen
机构：Texas A&M University
备注：for CRM module, see this https URL
摘要：在本报告中，比较了两种常用的预测水驱条件下油井产量的数据驱动模型：电容-电阻模型（CRM）和递归神经网络（RNN）。这两个模型都是完全数据驱动的，旨在从历史数据中了解洪水期间水库的行为。本报告是相关GitHub存储库中提供的基于python的CRM模型实现的技术指南。
摘要：In this report, two commonly used data-driven models for predicting well production under a waterflood setting: the capacitance resistance model (CRM) and recurrent neural networks (RNN) are compared. Both models are completely data-driven and are intended to learn the reservoir behavior during a water flood from historical data. This report serves as a technical guide to the python-based implementation of the CRM model available from the associated GitHub repository.

【6】 Learning to Forecast Dynamical Systems from Streaming Data
标题：学习从流数据中预测动态系统
链接：https://arxiv.org/abs/2109.09703

作者：Dimitris Giannakis,Amelia Henriksen,Joel A. Tropp,Rachel Ward
机构：‡Oden Institute, University of Texas at Austin, CaliforniaInstituteofTechnology
备注：30 pages, 3 tables, 8 figures
摘要：内核模拟预测（KAF）是对动态生成的时间序列数据进行数据驱动、非参数预测的一种强大方法。这种方法在Koopman算子理论中具有严格的基础，并且它在实际中产生了良好的预测，但是它受到核方法常见的计算代价的影响。本文提出了一种KAF的流式算法，该算法只需要对训练数据进行一次遍历。该算法在不牺牲预测技能的前提下，大大降低了训练和预测的成本。计算实验表明，流KAF方法可以成功地预测数据稀缺和数据丰富两种情况下的几类动力学系统（周期、准周期和混沌）。作为流式内核回归的新模板，整体方法可能具有更广泛的兴趣。
摘要：Kernel analog forecasting (KAF) is a powerful methodology for data-driven, non-parametric forecasting of dynamically generated time series data. This approach has a rigorous foundation in Koopman operator theory and it produces good forecasts in practice, but it suffers from the heavy computational costs common to kernel methods. This paper proposes a streaming algorithm for KAF that only requires a single pass over the training data. This algorithm dramatically reduces the costs of training and prediction without sacrificing forecasting skill. Computational experiments demonstrate that the streaming KAF method can successfully forecast several classes of dynamical systems (periodic, quasi-periodic, and chaotic) in both data-scarce and data-rich regimes. The overall methodology may have wider interest as a new template for streaming kernel regression.

【7】 Prediction of properties of metal alloy materials based on machine learning
标题：基于机器学习的金属合金材料性能预测
链接：https://arxiv.org/abs/2109.09394

作者：Houchen Zuo,Yongquan Jiang,Yan Yang,Jie Hu
机构：a.State key labratory of traction power, Southwest Jiaotong University, Chengdu , b. School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu , China
摘要：密度泛函理论及其优化算法是材料性能计算的主要方法。虽然计算结果是准确的，但它花费了大量的时间和金钱。为了缓解这个问题，我们打算使用机器学习来预测材料的性能。本文利用开放的量子材料数据库，对金属合金的原子体积、原子能和原子形成能进行了实验研究。通过传统的机器学习模型、深度学习网络和自动机器学习，验证了机器学习在材料性能预测中的可行性。实验结果表明，机器学习能够准确地预测材料的性能。
摘要：Density functional theory and its optimization algorithm are the main methods to calculate the properties in the field of materials. Although the calculation results are accurate, it costs a lot of time and money. In order to alleviate this problem, we intend to use machine learning to predict material properties. In this paper, we conduct experiments on atomic volume, atomic energy and atomic formation energy of metal alloys, using the open quantum material database. Through the traditional machine learning models, deep learning network and automated machine learning, we verify the feasibility of machine learning in material property prediction. The experimental results show that the machine learning can predict the material properties accurately.

其他神经网络|深度学习|模型|建模(36篇)

【1】 Machine-learning hidden symmetries
标题：机器学习隐藏对称
链接：https://arxiv.org/abs/2109.09721

作者：Ziming Liu,Max Tegmark
机构：Department of Physics, Massachusetts Institute of Technology, Cambridge, USA, )
备注：9 pages, 3 figs
摘要：我们提出了一种自动寻找隐藏对称的方法，定义为只有在新的坐标系中才能显现的对称，必须被发现。其核心思想是将不对称性量化为违反某些偏微分方程，并在所有可逆变换（参数化为可逆神经网络）的空间中以数值方式最小化这种违反。例如，我们的方法重新发现了著名的Gullstrand-Painleve度量，该度量体现了非旋转黑洞的Schwarzschild度量中隐藏的平移对称性，以及哈密顿性、模性和其他传统上不被视为对称性的简化特性。
摘要：We present an automated method for finding hidden symmetries, defined as symmetries that become manifest only in a new coordinate system that must be discovered. Its core idea is to quantify asymmetry as violation of certain partial differential equations, and to numerically minimize such violation over the space of all invertible transformations, parametrized as invertible neural networks. For example, our method rediscovers the famous Gullstrand-Painleve metric that manifests hidden translational symmetry in the Schwarzschild metric of non-rotating black holes, as well as Hamiltonicity, modularity and other simplifying traits not traditionally viewed as symmetries.

【2】 Generalization in Mean Field Games by Learning Master Policies
标题：学习主策略在平均场博弈中的泛化
链接：https://arxiv.org/abs/2109.09717

作者：Sarah Perrin,Mathieu Laurière,Julien Pérolat,Romuald Élie,Matthieu Geist,Olivier Pietquin
机构：Pietquin,†, Univ. Lille, CNRS, Inria, Centrale Lille, UMR , CRIStAL, Google Research, Brain Team, DeepMind Paris
摘要：平均场博弈（Mean-fieldgames，mfg）可以将多智能体系统扩展到大量智能体。然而，大多数文献假设代理的初始分布是单一的，这限制了MFG的实际应用。由于泛化能力，机器学习有可能解决更广泛多样的制造问题。我们研究如何利用这些泛化属性来学习策略，使典型代理能够针对任何人口分布进行最佳行为。参考MFGs中的主方程，我们创造了术语“主策略”来描述它们，并且我们证明了无论初始分布如何，单个主策略都提供了纳什均衡。我们提出了一种学习此类主策略的方法。我们的方法依赖于三个要素：将当前人口分布作为观察的一部分，使用神经网络近似主策略，以及通过强化学习和虚拟游戏进行训练。我们通过数值例子不仅说明了所学习的主策略的有效性，而且还说明了其超越用于训练的分布的泛化能力。
摘要：Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization properties to learn policies enabling a typical agent to behave optimally against any population distribution. In reference to the Master equation in MFGs, we coin the term ``Master policies'' to describe them and we prove that a single Master policy provides a Nash equilibrium, whatever the initial distribution. We propose a method to learn such Master policies. Our approach relies on three ingredients: adding the current population distribution as part of the observation, approximating Master policies with neural networks, and training via Reinforcement Learning and Fictitious Play. We illustrate on numerical examples not only the efficiency of the learned Master policy but also its generalization capabilities beyond the distributions used for training.

【3】 Modeling Regime Shifts in Multiple Time Series
标题：多时间序列中政权转移的建模
链接：https://arxiv.org/abs/2109.09692

作者：Etienne Gael Tajeuna,Mohamed Bouguessa,Shengrui Wang
机构： University of Sherbrooke, University of Quebec at Montreal, University of Quebec atMontreal
摘要：我们研究了在一个由多个时间序列组成的生态系统中发现和建模制度变迁的问题，这些时间序列被称为协同进化时间序列。制度变迁是指在不同的时间间隔内，一系列行为表现出的变化。学习这些变化的行为是时间序列预测的关键一步。虽然已经取得了进展，但现有方法存在以下一个或多个缺点：（1）在发现多个时间序列中的状态时没有考虑时间序列之间的关系；（2）缺乏一种有效的方法来模拟序列所表现出的时间依赖性行为；（3）处理可能提供信息的数据不连续性方面的困难。现有的大多数方法无法在一个统一的框架内处理这三个问题。因此，这促使我们努力设计一种有原则的方法，用于在协同演化的时间序列中建模交互和时间依赖性。具体地说，我们通过将时间序列的重集合总结为一个称为\textit{mapping grid}的更轻且更有意义的结构，对多个时间序列的生态系统进行建模。通过使用映射网格，我们的模型首先通过动态网络表示学习时间序列的行为依赖，然后通过完全依赖时间的Cox回归模型学习制度变迁机制。我们的方法的独创性在于在政体识别中对时间序列之间的相互作用进行建模，并对与时间相关的政体过渡概率进行建模，在现有工作中通常假定这些概率是静态的。
摘要：We investigate the problem of discovering and modeling regime shifts in an ecosystem comprising multiple time series known as co-evolving time series. Regime shifts refer to the changing behaviors exhibited by series at different time intervals. Learning these changing behaviors is a key step toward time series forecasting. While advances have been made, existing methods suffer from one or more of the following shortcomings: (1) failure to take relationships between time series into consideration for discovering regimes in multiple time series; (2) lack of an effective approach that models time-dependent behaviors exhibited by series; (3) difficulties in handling data discontinuities which may be informative. Most of the existing methods are unable to handle all of these three issues in a unified framework. This, therefore, motivates our effort to devise a principled approach for modeling interactions and time-dependency in co-evolving time series. Specifically, we model an ecosystem of multiple time series by summarizing the heavy ensemble of time series into a lighter and more meaningful structure called a \textit{mapping grid}. By using the mapping grid, our model first learns time series behavioral dependencies through a dynamic network representation, then learns the regime transition mechanism via a full time-dependent Cox regression model. The originality of our approach lies in modeling interactions between time series in regime identification and in modeling time-dependent regime transition probabilities, usually assumed to be static in existing work.

【4】 Comparing Rewinding and Fine-tuning in Neural Network Pruning for Reproducibility Challenge 2021
标题：2021可重复性挑战赛中神经网络修剪中倒带和微调的比较
链接：https://arxiv.org/abs/2109.09670

作者：Szymon Mikler
机构：Uniwersytet Wrocławski, Reproduction Summary, Scope of Reproducibility, We are reproducing Comparing Rewinding and Fine-tuning in Neural Networks, by Renda et al. [,]. In this work
备注：9 pages, 6 figures
摘要：再现性范围：我们正在对arXiv:2003.02389中的神经网络中的倒带和微调进行再现比较。在这项工作中，作者比较了修剪后重新训练神经网络的三种不同方法：1）微调，2）重绕arXiv:1803.03635中的权重，以及3）一种新的、原始的方法，包括学习率重绕，建立在彩票假设的基础上。我们再现了所有三种方法的结果，但我们重点验证了它们的方法，即学习速率倒带，因为它是新提出的，并且被描述为其他方法的通用替代方法。我们将CIFAR10用于大多数复制，并在更大的CIFAR100上进行附加实验，这扩展了作者最初提供的结果。我们还扩展了经过测试的网络体系结构列表，以包括广泛的resnet。新的实验使我们发现了学习速率倒带的局限性，它会恶化大型体系结构上的修剪结果。结果：我们能够重现作者在所有最初报告的情景中报告的确切结果。然而，在更大的宽残差网络上的扩展结果已经证明了新提出的学习速率重绕的局限性——我们观察到在低稀疏范围内先前未报告的精度下降。尽管如此，该文件的一般结论仍然有效，并确实得到了复制。
摘要：Scope of reproducibility: We are reproducing Comparing Rewinding and Fine-tuning in Neural Networks from arXiv:2003.02389. In this work the authors compare three different approaches to retraining neural networks after pruning: 1) fine-tuning, 2) rewinding weights as in arXiv:1803.03635 and 3) a new, original method involving learning rate rewinding, building upon Lottery Ticket Hypothesis. We reproduce the results of all three approaches, but we focus on verifying their approach, learning rate rewinding, since it is newly proposed and is described as a universal alternative to other methods. We used CIFAR10 for most reproductions along with additional experiments on the larger CIFAR100, which extends the results originally provided by the authors. We have also extended the list of tested network architectures to include Wide ResNets. The new experiments led us to discover the limitations of learning rate rewinding which can worsen pruning results on large architectures. Results: We were able to reproduce the exact results reported by the authors in all originally reported scenarios. However, extended results on larger Wide Residual Networks have demonstrated the limitations of the newly proposed learning rate rewinding -- we observed a previously unreported accuracy degradation for low sparsity ranges. Nevertheless, the general conclusion of the paper still holds and was indeed reproduced.

【5】 Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks
标题：动态神经多样化：通向计算可持续神经网络的途径
链接：https://arxiv.org/abs/2109.09612

作者：Alexander Kovalenko,Pavel Kordík,Magda Friedjungová
机构：Czech Technical University in Prague, Prague, Czech Republic
备注：None
摘要：具有有限数量可训练参数的小型神经网络，对于许多简单任务来说是合适的资源高效候选，而现在使用的是过大的模型。然而，这种模型在学习过程中面临一些问题，主要是由于单个神经元的冗余，导致次优精度或需要额外的训练步骤。在这里，我们探索学习过程中隐藏层中神经元的多样性，并分析神经元的多样性如何影响模型的预测。下面，我们将介绍几种技术，以便在训练期间动态增强神经元之间的多样性。这些去相关技术改善了早期阶段的学习，偶尔有助于更快地克服局部极小值。此外，我们还描述了一种新的权值初始化方法，以获得快速有效的神经网络训练所需的不相关但随机的权值初始化。在我们的案例中，解相关权重初始化显示，在前5个时期，测试精度相对提高了约40%。
摘要：Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks, where now excessively large models are used. However, such models face several problems during the learning process, mainly due to the redundancy of the individual neurons, which results in sub-optimal accuracy or the need for additional training steps. Here, we explore the diversity of the neurons within the hidden layer during the learning process, and analyze how the diversity of the neurons affects predictions of the model. As following, we introduce several techniques to dynamically reinforce diversity between neurons during the training. These decorrelation techniques improve learning at early stages and occasionally help to overcome local minima faster. Additionally, we describe novel weight initialization method to obtain decorrelated, yet stochastic weight initialization for a fast and efficient neural network training. Decorrelated weight initialization in our case shows about 40% relative increase in test accuracy during the first 5 epochs.

【6】 Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems
标题：物理系统网格化建模的条件参数化、离散化感知神经网络
链接：https://arxiv.org/abs/2109.09510

作者：Jiayang Xu,Aniruddhe Pradhan,Karthik Duraisamy
机构：Department of Aerospace Engineering, University of Michigan, Michigan, MI
摘要：物理系统的数值模拟在很大程度上依赖于基于网格的模型。虽然人们已经广泛探索神经网络来辅助这类任务，但它们通常忽略输入特征之间的相互作用或层次关系，并将其作为串联的混合物进行处理。在这项工作中，我们推广了条件参数化的思想——使用输入参数的可训练函数来生成神经网络的权值，并以灵活的方式对其进行扩展，以编码对数值模拟至关重要的信息。受离散化数值方法的启发，参数的选择包括物理量和网格拓扑特征。建模特征和参数之间的功能关系被构建到网络体系结构中。该方法在不同的网络上实现，并应用于多个前沿科学机器学习任务，包括未建模物理的发现、粗场的超分辨率以及带有化学反应的非定常流的模拟。结果表明，与传统网络相比，条件参数化网络具有更高的性能。本文还提出了一种称为CP-GNet的网络结构，作为第一个能够独立预测不规则网格上反应流的深度学习模型。
摘要：The numerical simulations of physical systems are heavily dependent on mesh-based models. While neural networks have been extensively explored to assist such tasks, they often ignore the interactions or hierarchical relations between input features, and process them as concatenated mixtures. In this work, we generalize the idea of conditional parametrization -- using trainable functions of input parameters to generate the weights of a neural network, and extend them in a flexible way to encode information critical to the numerical simulations. Inspired by discretized numerical methods, choices of the parameters include physical quantities and mesh topology features. The functional relation between the modeled features and the parameters are built into the network architecture. The method is implemented on different networks, which are applied to several frontier scientific machine learning tasks, including the discovery of unmodeled physics, super-resolution of coarse fields, and the simulation of unsteady flows with chemical reactions. The results show that the conditionally parameterized networks provide superior performance compared to their traditional counterparts. A network architecture named CP-GNet is also proposed as the first deep learning model capable of standalone prediction of reacting flows on irregular meshes.

【7】 DeepPhysics: a physics aware deep learning framework for real-time simulation
标题：深度物理：一种物理感知的实时仿真深度学习框架
链接：https://arxiv.org/abs/2109.09491

作者：Alban Odot,Ryadh Haferssas,Stéphane Cotin
机构：MIMESIS team, Inria, Place de l’Hopital, Strasbourg, France, Correspondence, Stephane Cotin
摘要：弹性结构的实时仿真在许多应用中是必不可少的，从计算机引导的外科手术到机械工程中的交互式设计。有限元法常被用作求解与这些问题相关的偏微分方程的数值参考方法。然而，深度学习方法最近表明，它们可以代表一种解决基于物理的问题1、2、3的替代策略。在本文中，我们提出了一种使用数据驱动方法模拟超弹性材料的解决方案，其中训练神经网络来学习边界条件和产生的位移场之间的非线性关系。我们还介绍了一种方法来保证解决方案的有效性。总之，我们提出了三个贡献：基于模态分析的优化数据集生成算法、物理信息损失函数和混合牛顿-拉斐逊算法。该方法应用于两个基准：悬臂梁和螺旋桨。结果表明，我们使用有限数据训练的网络结构可以在不到一毫秒的时间内预测位移场。对于振幅为几厘米的非线性变形，对各种几何、拓扑、网格分辨率和边界条件的预测精确到几微米。
摘要：Real-time simulation of elastic structures is essential in many applications, from computer-guided surgical interventions to interactive design in mechanical engineering. The Finite Element Method is often used as the numerical method of reference for solving the partial differential equations associated with these problems. Yet, deep learning methods have recently shown that they could represent an alternative strategy to solve physics-based problems 1,2,3. In this paper, we propose a solution to simulate hyper-elastic materials using a data-driven approach, where a neural network is trained to learn the non-linear relationship between boundary conditions and the resulting displacement field. We also introduce a method to guarantee the validity of the solution. In total, we present three contributions: an optimized data set generation algorithm based on modal analysis, a physics-informed loss function, and a Hybrid Newton-Raphson algorithm. The method is applied to two benchmarks: a cantilever beam and a propeller. The results show that our network architecture trained with a limited amount of data can predict the displacement field in less than a millisecond. The predictions on various geometries, topologies, mesh resolutions, and boundary conditions are accurate to a few micrometers for non-linear deformations of several centimeters of amplitude.

【8】 Algorithmic Fairness Verification with Graphical Models
标题：基于图形模型的算法公平性验证
链接：https://arxiv.org/abs/2109.09447

作者：Bishwamittra Ghosh,Debabrota Basu,Kuldeep S. Meel
机构： School of Computing, National University of Singapore, Singapore, ´Equipe Scool, Univ. Lille, Inria, UMR , - CRIStAL, CNRS, Centrale Lille, France
摘要：近年来，机器学习（ML）算法已被应用于安全关键和高风险决策中，其中算法的公平性至关重要。ML中的公平性集中于检测由ML分类器引起的对特定人口群体的偏见，并提出算法解决方案，以缓解不同公平性定义的偏见。为此，已经提出了几个公平性验证器，在给定输入特征的概率分布的情况下，计算ML分类器预测中的偏差（基本上超出有限数据集）。在验证线性分类器的上下文中，现有公平性验证器受到精度的限制，这是由于特征之间的相关性建模不精确，以及由于分类器的限制性公式（如SSAT或SMT公式）或抽样而导致的可伸缩性。在本文中，我们提出了一种有效的公平性验证器，称为FVGM，它将特征之间的相关性编码为贝叶斯网络。与现有的验证方法相比，FVGM提出了一种基于随机子集和的线性分类器验证方法。在实验上，我们证明了FVGM比最新的公平性增强算法、公平性攻击和组/因果公平性度量更能准确和可扩展地评估更多种类的公平性增强算法、公平性攻击和组/因果公平性度量。我们还证明了FVGM有助于公平性影响函数的计算，作为检测特征子集引起的偏差源的垫脚石。
摘要：In recent years, machine learning (ML) algorithms have been deployed in safety-critical and high-stake decision-making, where the fairness of algorithms is of paramount importance. Fairness in ML centers on detecting bias towards certain demographic populations induced by an ML classifier and proposes algorithmic solutions to mitigate the bias with respect to different fairness definitions. To this end, several fairness verifiers have been proposed that compute the bias in the prediction of an ML classifier -- essentially beyond a finite dataset -- given the probability distribution of input features. In the context of verifying linear classifiers, existing fairness verifiers are limited by accuracy due to imprecise modelling of correlations among features and scalability due to restrictive formulations of the classifiers as SSAT or SMT formulas or by sampling. In this paper, we propose an efficient fairness verifier, called FVGM, that encodes the correlations among features as a Bayesian network. In contrast to existing verifiers, FVGM proposes a stochastic subset-sum based approach for verifying linear classifiers. Experimentally, we show that FVGM leads to an accurate and scalable assessment for more diverse families of fairness-enhancing algorithms, fairness attacks, and group/causal fairness metrics than the state-of-the-art. We also demonstrate that FVGM facilitates the computation of fairness influence functions as a stepping stone to detect the source of bias induced by subsets of features.

【9】 When Do Extended Physics-Informed Neural Networks (XPINNs) Improve Generalization?
标题：扩展物理信息神经网络(XPINN)何时能提高泛化能力？
链接：https://arxiv.org/abs/2109.09444

作者：Zheyuan Hu,Ameya D. Jagtap,George Em Karniadakis,Kenji Kawaguchi
机构： Brown University
备注：33 pages, 11 figures
摘要：物理信息神经网络（PINNs）由于其良好的逼近能力和泛化能力，已成为求解高维偏微分方程（pde）的热门选择。近年来，基于区域分解方法的扩展PINNs（XPINNs）因其在多尺度、多物理问题建模中的有效性和并行性而受到广泛关注。然而，关于它们的收敛性和泛化性质的理论理解仍有待探索。在这项研究中，我们朝着理解XPINNs如何以及何时优于PINNs迈出了第一步。具体来说，对于一般的多层pinn和xpinn，我们首先通过PDE问题中目标函数的复杂性提供一个先验泛化界，并通过优化后网络的后验矩阵范数提供一个后验泛化界。此外，基于我们的边界，我们分析了XPINNs提高泛化能力的条件。具体地说，我们的理论表明，XPINN的关键构建块，即区域分解，为泛化引入了折衷。一方面，XPINNs将复杂的PDE解决方案分解为几个简单的部分，这降低了学习每个部分所需的复杂性，并提高了泛化能力。另一方面，分解会导致每个子域中可用的训练数据较少，因此此类模型通常容易过度拟合，并且可能变得不太通用。根据经验，我们选择了五个偏微分方程来显示XPINN的性能何时优于、类似于或更差于pinn，从而证明并证明了我们的新理论。
摘要：Physics-informed neural networks (PINNs) have become a popular choice for solving high-dimensional partial differential equations (PDEs) due to their excellent approximation power and generalization ability. Recently, Extended PINNs (XPINNs) based on domain decomposition methods have attracted considerable attention due to their effectiveness in modeling multiscale and multiphysics problems and their parallelization. However, theoretical understanding on their convergence and generalization properties remains unexplored. In this study, we take an initial step towards understanding how and when XPINNs outperform PINNs. Specifically, for general multi-layer PINNs and XPINNs, we first provide a prior generalization bound via the complexity of the target functions in the PDE problem, and a posterior generalization bound via the posterior matrix norms of the networks after optimization. Moreover, based on our bounds, we analyze the conditions under which XPINNs improve generalization. Concretely, our theory shows that the key building block of XPINN, namely the domain decomposition, introduces a tradeoff for generalization. On the one hand, XPINNs decompose the complex PDE solution into several simple parts, which decreases the complexity needed to learn each part and boosts generalization. On the other hand, decomposition leads to less training data being available in each subdomain, and hence such model is typically prone to overfitting and may become less generalizable. Empirically, we choose five PDEs to show when XPINNs perform better than, similar to, or worse than PINNs, hence demonstrating and justifying our new theory.

【10】 Explaining Convolutional Neural Networks by Tagging Filters
标题：用标记滤波器解释卷积神经网络
链接：https://arxiv.org/abs/2109.09389

作者：Anna Nguyen,Daniel Hagenmayer,Tobias Weller,Michael Färber
机构：Karlsruhe Institute of Technology, Karlsruhe, Germany, University of Mannheim, Mannheim, Germany
摘要：卷积神经网络（CNN）在各种图像分类任务中取得了惊人的性能，但人类很难理解分类是如何产生的。最近的文献提出了向人类解释分类过程的方法。这些主要集中在可视化特征图和过滤器权重上，对于分析CNN分类的非专家来说，这不是很直观。在本文中，我们提出了FilTag，这是一种即使对非专家也能有效解释CNN的方法。其思想是，当一个类的图像经常激活卷积滤波器时，该滤波器就被标记为该类。这些标记为过滤器检测到的特定于类的特性的引用提供了解释。基于标记，可以根据输入图像激活的过滤器的标记直观地解释单个图像分类。最后，我们证明了标签有助于分析由噪声输入图像引起的分类错误，并且标签可以被机器进一步处理。
摘要：Convolutional neural networks (CNNs) have achieved astonishing performance on various image classification tasks, but it is difficult for humans to understand how a classification comes about. Recent literature proposes methods to explain the classification process to humans. These focus mostly on visualizing feature maps and filter weights, which are not very intuitive for non-experts in analyzing a CNN classification. In this paper, we propose FilTag, an approach to effectively explain CNNs even to non-experts. The idea is that when images of a class frequently activate a convolutional filter, then that filter is tagged with that class. These tags provide an explanation to a reference of a class-specific feature detected by the filter. Based on the tagging, individual image classifications can then be intuitively explained in terms of the tags of the filters that the input image activates. Finally, we show that the tags are helpful in analyzing classification errors caused by noisy input images and that the tags can be further processed by machines.

【11】 Learning in Sinusoidal Spaces with Physics-Informed Neural Networks
标题：基于物理信息神经网络的正弦空间学习
链接：https://arxiv.org/abs/2109.09338

作者：Jian Cheng Wong,Chinchun Ooi,Abhishek Gupta,Yew-Soon Ong
机构：(IHPC), Agency for Science, Technology and Research (ASTAR), Singapore, and is also with the School of Computer Science and Engineering, Nanyang, Agency for Science, Technology and Research (ASTAR), Singapore (email:
备注：Currently under review
摘要：物理信息神经网络（PINN）使用物理增强损失函数，例如，结合控制微分方程的残差项，以确保其输出符合基本物理定律。然而，在实践中，对于许多问题，很难训练出精确的PINN模型。在这篇文章中，我们通过一个新的视角来解决这个问题，即在正弦空间中使用pinn学习的优点。通过分析模型初始化时的渐近行为，我们首先证明了尺寸增加（即宽度和深度）的PINN会导致对平坦输出的偏向。值得注意的是，平坦函数是许多物理微分方程的平凡解，因此，在远离真实解的情况下，欺骗性地最小化了增广损失的剩余项。然后，我们证明，在我们称为sf-PINN的体系结构中，输入的正弦映射能够提高输出可变性，从而避免陷入欺骗性的局部极小值。此外，可以有效地调节可变性水平，以匹配手头问题中的高频模式。本文的一个关键方面是全面的实证研究，该研究证明了在正弦空间中使用pinn学习跨越多个物理领域的各种正向和反向建模问题的有效性。
摘要：A physics-informed neural network (PINN) uses physics-augmented loss functions, e.g., incorporating the residual term from governing differential equations, to ensure its output is consistent with fundamental physics laws. However, it turns out to be difficult to train an accurate PINN model for many problems in practice. In this paper, we address this issue through a novel perspective on the merits of learning in sinusoidal spaces with PINNs. By analyzing asymptotic behavior at model initialization, we first prove that a PINN of increasing size (i.e., width and depth) induces a bias towards flat outputs. Notably, a flat function is a trivial solution to many physics differential equations, hence, deceptively minimizing the residual term of the augmented loss while being far from the true solution. We then show that the sinusoidal mapping of inputs, in an architecture we label as sf-PINN, is able to elevate output variability, thus avoiding being trapped in the deceptive local minimum. In addition, the level of variability can be effectively modulated to match high-frequency patterns in the problem at hand. A key facet of this paper is the comprehensive empirical study that demonstrates the efficacy of learning in sinusoidal spaces with PINNs for a wide range of forward and inverse modelling problems spanning multiple physics domains.

【12】 Assisted Learning for Organizations with Limited Data
标题：面向有限数据组织的辅助学习
链接：https://arxiv.org/abs/2109.09307

作者：Cheng Chen,Jiaying Zhou,Jie Ding,Yi Zhou
机构：Department of ECE, University of Utah, Salt Lake City, UT , School of Statistics, University of Minnesota, Minneapolis, MN
备注：16 pages, 18 figures
摘要：我们开发了一个辅助学习框架，帮助组织级学习者在有限和不平衡的数据下提高学习成绩。特别是，组织级别的学习者通常拥有足够的计算资源，但受制于严格的协作策略和信息隐私。他们有限的不平衡数据常常导致有偏见的推理和次优决策。在我们的辅助学习框架中，组织学习者从服务提供商处购买辅助服务，目的是在几轮辅助中提高其模型性能。我们为辅助深度学习和辅助强化学习开发了有效的随机训练算法。与需要频繁传输梯度或模型的现有分布式算法不同，我们的框架允许学习者仅偶尔与服务提供商共享信息，并且仍然实现一个近乎oracle的模型，就好像所有数据都是集中的一样。
摘要：We develop an assisted learning framework for assisting organization-level learners to improve their learning performance with limited and imbalanced data. In particular, learners at the organization level usually have sufficient computation resource, but are subject to stringent collaboration policy and information privacy. Their limited imbalanced data often cause biased inference and sub-optimal decision-making. In our assisted learning framework, an organizational learner purchases assistance service from a service provider and aims to enhance its model performance within a few assistance rounds. We develop effective stochastic training algorithms for assisted deep learning and assisted reinforcement learning. Different from existing distributed algorithms that need to frequently transmit gradients or models, our framework allows the learner to only occasionally share information with the service provider, and still achieve a near-oracle model as if all the data were centralized.

【13】 Merlion: A Machine Learning Library for Time Series
标题：Merlion：一个面向时间序列的机器学习库
链接：https://arxiv.org/abs/2109.09265

作者：Aadyot Bhatnagar,Paul Kassianik,Chenghao Liu,Tian Lan,Wenzhuo Yang,Rowan Cassius,Doyen Sahoo,Devansh Arpit,Sri Subramanian,Gerald Woo,Amrita Saha,Arun Kumar Jagota,Gokulakrishnan Gopalakrishnan,Manpreet Singh,K C Krithika,Sukumar Maddineni,Daeki Cho,Bo Zong,Yingbo Zhou,Caiming Xiong,Silvio Savarese,Steven Hoi,Huan Wang
机构：AI Research, Salesforce, Monitoring Cloud, Salesforce, Warden AIOps, Salesforce, Service Protection, Salesforce
备注：22 pages, 1 figure, 14 tables
摘要：我们介绍Merlion，一个开放源码的时间序列机器学习库。它为许多常用模型和数据集提供了统一的接口，用于在单变量和多变量时间序列上进行异常检测和预测，以及标准的前/后处理层。它有几个模块来提高易用性，包括可视化、异常评分校准以提高可解释性、用于超参数调整和模型选择的AutoML以及模型集成。Merlion还提供了一个独特的评估框架，用于模拟生产中模型的实时部署和重新训练。该库旨在为工程师和研究人员提供一站式解决方案，以快速开发满足其特定时间序列需求的模型，并跨多个时间序列数据集对其进行基准测试。在本技术报告中，我们重点介绍了Merlion的体系结构和主要功能，并报告了不同基线模型和集成的基准数据。
摘要：We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve interpetability, AutoML for hyperparameter tuning and model selection, and model ensembling. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs and benchmark them across multiple time series datasets. In this technical report, we highlight Merlion's architecture and major functionalities, and we report benchmark numbers across different baseline models and ensembles.

【14】 Splitfed learning without client-side synchronization: Analyzing client-side split network portion size to overall performance
标题：无客户端同步的拆分式学习：分析客户端拆分式网络部分大小对整体性能的影响
链接：https://arxiv.org/abs/2109.09246

作者：Praveen Joshi,Chandra Thapa,Seyit Camtepe,Mohammed Hasanuzzamana,Ted Scully,Haithem Afli
机构：a Munster Technological University, Ireland, CSIRO Data, Australia
备注：CERC 2021
摘要：联邦学习（FL）、分割学习（SL）和分割学习（SFL）是分布式机器学习的三个最新发展，由于它们能够保护原始数据的隐私性而受到关注。因此，它们广泛应用于数据敏感的各个领域，如大规模医学图像分类、医疗物联网和跨组织网络钓鱼电子邮件检测。SFL是在FL和SL的汇合点上开发的。它通过从FL范式提供并行客户端机器学习模型更新，并通过在来自SL的客户端和服务器之间拆分模型，提供更高级别的模型隐私（在训练时）。然而，由于客户端模型同步的要求，SFL在客户端有通信和计算开销。对于资源受限的客户端，需要删除这些需求以提高学习效率。在这方面，本文研究了无需客户端模型同步的SFL。由此产生的架构称为多头分割学习。在分布式客户间IID数据分布的情况下，我们对MNIST数据的ResNet18模型进行了实证研究，发现多头分割学习是可行的。其性能与SFL相当。此外，在MNIST测试集上，SFL的准确率仅比多头分割学习高1%-2%。为了进一步加强我们的研究结果，我们研究了不同客户端模型部分的多头分割学习及其对整体性能的影响。为此，我们的结果发现对模型整体性能的影响最小。
摘要：Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL) are three recent developments in distributed machine learning that are gaining attention due to their ability to preserve the privacy of raw data. Thus, they are widely applicable in various domains where data is sensitive, such as large-scale medical image classification, internet-of-medical-things, and cross-organization phishing email detection. SFL is developed on the confluence point of FL and SL. It brings the best of FL and SL by providing parallel client-side machine learning model updates from the FL paradigm and a higher level of model privacy (while training) by splitting the model between the clients and server coming from SL. However, SFL has communication and computation overhead at the client-side due to the requirement of client-side model synchronization. For the resource-constrained client-side, removal of such requirements is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as Multi-head Split Learning. Our empirical studies considering the ResNet18 model on MNIST data under IID data distribution among distributed clients find that Multi-head Split Learning is feasible. Its performance is comparable to the SFL. Moreover, SFL provides only 1%-2% better accuracy than Multi-head Split Learning on the MNIST test set. To further strengthen our results, we study the Multi-head Split Learning with various client-side model portions and its impact on the overall performance. To this end, our results find a minimal impact on the overall performance of the model.

【15】 Identifying Ventricular Arrhythmias and Their Predictors by Applying Machine Learning Methods to Electronic Health Records in Patients With Hypertrophic Cardiomyopathy(HCM-VAr-Risk Model)
标题：应用机器学习方法识别肥厚型心肌病患者的室性心律失常及其预测因素(HCM-var-Risk模型)
链接：https://arxiv.org/abs/2109.09210

作者：Moumita Bhattacharya,Dai-Yin Lu,Shibani M Kudchadkar,Gabriela Villarreal Greenland,Prasanth Lingamaneni,Celia P Corona-Villalobos,Yufan Guan,Joseph E Marine,Jeffrey E Olgin,Stefan Zimmerman,Theodore P Abraham,Hagit Shatkay,Maria Roselle Abraham
机构： National Yang-MingUniversity, University of California San Francisco
备注：None
摘要：肥厚型心肌病（HC）心源性猝死（SCD）的临床风险分层采用源自美国心脏病学会基金会/美国心脏协会（ACCF/AHA）指南或HCM风险SCD模型（C指数为0.69）的规则，该模型利用了一些临床变量。我们评估数据驱动的机器学习方法，考虑更广泛的变量可以有效地识别HC患者室性心律失常（VAR），导致SCD。我们扫描了711名HC患者的电子健康记录，以确定是否存在持续性室性心动过速或室颤。室性心动过速或室颤患者（n=61）被标记为VAr病例，其余患者（n=650）被标记为非VAr病例。使用2样本t检验和信息增益标准确定区分VAr和非VAr的信息量最大的临床变量；患者记录减少到只包括这些变量。通过应用过度和不足抽样策略的组合，解决了由于风险值案例数量较少而导致的数据不平衡问题。我们在这种抽样方法下对多个分类器进行了训练和测试，显示了有效的分类。我们评估了93个临床变量，其中22个被证明是VAr的预测变量。基于这22个变量进行训练并纠正数据不平衡的logistic回归和朴素贝叶斯分类器的集成在将VAr与非VAr病例分离方面最为有效（敏感性=0.73，特异性=0.76，C指数=0.83）。我们的方法（HCM VAr风险模型）确定了12个新的VAr预测因子，以及10个已建立的SCD预测因子。总之，这是机器学习首次应用于使用临床属性识别HC患者VAr。
摘要：Clinical risk stratification for sudden cardiac death (SCD) in hypertrophic cardiomyopathy (HC) employs rules derived from American College of Cardiology Foundation/American Heart Association (ACCF/AHA) guidelines or the HCM Risk-SCD model (C-index of 0.69), which utilize a few clinical variables. We assessed whether data-driven machine learning methods that consider a wider range of variables can effectively identify HC patients with ventricular arrhythmias (VAr) that lead to SCD. We scanned the electronic health records of 711 HC patients for sustained ventricular tachycardia or ventricular fibrillation. Patients with ventricular tachycardia or ventricular fibrillation (n = 61) were tagged as VAr cases and the remaining (n = 650) as non-VAr. The 2-sample t test and information gain criterion were used to identify the most informative clinical variables that distinguish VAr from non-VAr; patient records were reduced to include only these variables. Data imbalance stemming from low number of VAr cases was addressed by applying a combination of over- and under-sampling strategies.We trained and tested multiple classifiers under this sampling approach, showing effective classification. We evaluated 93 clinical variables, of which 22 proved predictive of VAr. The ensemble of logistic regression and naive Bayes classifiers, trained based on these 22 variables and corrected for data imbalance, was most effective in separating VAr from non-VAr cases (sensitivity = 0.73, specificity = 0.76, C-index = 0.83). Our method (HCM-VAr-Risk Model) identified 12 new predictors of VAr, in addition to 10 established SCD predictors. In conclusion, this is the first application of machine learning for identifying HC patients with VAr, using clinical attributes.

【16】 Machine Learning Methods for Identifying Atrial Fibrillation Cases and Their Predictors in Patients With Hypertrophic Cardiomyopathy: The HCM-AF-Risk Model
标题：识别肥厚型心肌病患者心房颤动病例及其预测因子的机器学习方法：HCM-AF-Risk模型
链接：https://arxiv.org/abs/2109.09207

作者：Moumita Bhattacharya,Dai-Yin Lu,Ioannis Ventoulis,Gabriela V. Greenland,Hulya Yalcin,Yufan Guan,Joseph E. Marine,Jeffrey E. Olgin,Stefan L. Zimmerman,Theodore P. Abraham,M. Roselle Abraham,Hagit Shatkay
机构：Computational Biomedicine and Machine Learning Lab, Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, USA, Hypertrophic Cardiomyopathy Center of Excellence, Johns Hopkins University, Baltimore, Maryland, USA
备注：None
摘要：肥厚型心肌病（HCM）患者心房颤动（AF）发生率高，卒中风险增加，即使充血性心力衰竭、高血压、年龄、糖尿病、既往卒中/短暂性脑缺血发作评分较低。因此，有必要了解HCM中AF和中风的病理生理学。在这项回顾性研究中，我们开发并应用一种数据驱动、基于机器学习的方法，利用电子健康记录数据识别房颤病例以及与房颤相关的临床和影像学特征。有阵发性/持续性/永久性房颤记录的HCM患者（n=191）被视为房颤患者，其余患者为窦性心律（n=640）我们评估了93个临床变量，并根据2样本t检验和信息增益标准选择了有助于区分房颤和非房颤病例的信息量最大的变量。我们确定了18个与HCM中AF正相关（n=11）和负相关（n=7）的高信息变量。接下来，通过这18个变量表示患者记录。由于房颤病例数量相对较少而导致的数据不平衡通过过采样和欠采样策略的组合得到解决。我们在这种抽样方法下训练和测试了多个分类器，显示了有效的分类。具体而言，基于18个变量进行训练并纠正数据不平衡的logistic回归和朴素贝叶斯分类器的集合被证明对区分房颤和非房颤最有效（敏感性=0.74，特异性=0.70，C指数=0.80）。我们的模型是第一个基于机器学习的方法来识别HCM中的房颤病例。该模型显示了良好的性能，解决了数据不平衡问题，并表明AF与更严重的心脏HCM表型相关。
摘要：Hypertrophic cardiomyopathy (HCM) patients have a high incidence of atrial fibrillation (AF) and increased stroke risk, even with low risk of congestive heart failure, hypertension, age, diabetes, previous stroke/transient ischemic attack scores. Hence, there is a need to understand the pathophysiology of AF and stroke in HCM. In this retrospective study, we develop and apply a data-driven, machine learning based method to identify AF cases, and clinical and imaging features associated with AF, using electronic health record data. HCM patients with documented paroxysmal/persistent/permanent AF (n = 191) were considered AF cases, and the remaining patients in sinus rhythm (n = 640) were tagged as No-AF. We evaluated 93 clinical variables and the most informative variables useful for distinguishing AF from No-AF cases were selected based on the 2-sample t test and the information gain criterion. We identified 18 highly informative variables that are positively (n = 11) and negatively (n = 7) correlated with AF in HCM. Next, patient records were represented via these 18 variables. Data imbalance resulting from the relatively low number of AF cases was addressed via a combination of oversampling and under-sampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. Specifically, an ensemble of logistic regression and naive Bayes classifiers, trained based on the 18 variables and corrected for data imbalance, proved most effective for separating AF from No-AF cases (sensitivity = 0.74, specificity = 0.70, C-index = 0.80). Our model is the first machine learning based method for identification of AF cases in HCM. This model demonstrates good performance, addresses data imbalance, and suggests that AF is associated with a more severe cardiac HCM phenotype.

【17】 Towards Zero-Label Language Learning
标题：走向零标签语言学习
链接：https://arxiv.org/abs/2109.09193

作者：Zirui Wang,Adams Wei Yu,Orhan Firat,Yuan Cao
机构：Google AI
摘要：本文探讨了自然语言处理（NLP）中的零标记学习，即在训练过程中，任何地方都不使用人工标注的数据，模型仅在合成数据上训练。我们框架的核心是一种新的方法，可以更好地利用强大的预训练语言模型。具体地说，受最近GPT-3上Few-Shot推断成功的启发，我们提出了一个名为无监督数据生成（UDG）的训练数据创建过程，该过程利用Few-Shot提示合成高质量的训练数据，而不需要真正的人工注释。我们的方法实现了零标记学习，因为我们仅在合成数据上训练特定于任务的模型，但我们通过在人类标记数据上训练的强基线模型获得了更好或可比的结果。此外，当与标记数据混合时，我们的方法可以作为一个高效的数据扩充过程，在SuperGLUE基准上获得最新的结果。
摘要：This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data. At the core of our framework is a novel approach for better leveraging the powerful pretrained language models. Specifically, inspired by the recent success of few-shot inference on GPT-3, we present a training data creation procedure named Unsupervised Data Generation (UDG), which leverages few-shot prompts to synthesize high-quality training data without real human annotations. Our method enables zero-label learning as we train task-specific models solely on the synthetic data, yet we achieve better or comparable results from strong baseline models trained on human-labeled data. Furthermore, when mixed with labeled data, our approach serves as a highly effective data augmentation procedure, achieving new state-of-the-art results on the SuperGLUE benchmark.

【18】 What BERT Based Language Models Learn in Spoken Transcripts: An Empirical Study
标题：基于BERT的语言模型从口语记录中学到什么：一项实证研究
链接：https://arxiv.org/abs/2109.09105

作者：Ayush Kumar,Mukuntha Narayanan Sundararaman,Jithendra Vepa
机构：Observe.AI, Bangalore, India
备注：BlackboxNLP @ EMNLP 2021 (15 pages, includes Appendix)
摘要：语言模型（LMs）广泛应用于各种任务，包括口语理解（SLU）。口语需要仔细理解说话人之间的互动、对话状态和言语诱导的多模态行为，以产生有意义的会话表征。在这项工作中，我们建议将SLU分解为三个有代表性的特征：会话（不流利、停顿、过语）、通道（说话人类型、转身任务）和SR（插入、删除、替换）。我们探索基于BERT的语言模型（BERT，罗BERT）对口语记录进行训练，以调查其在没有任何语音线索的情况下理解各种属性的能力。实证结果表明，LM出人意料地擅长从词汇标记中捕捉会话属性，如停顿预测和过语检测。另一方面，LM在转身任务和ASR中得分较低错误预测。此外，在口语转录本上对LM进行预训练会抑制其语言理解。最后，我们在两个基准数据集：交换台对话行为和不流利数据集上建立了上述属性的有效性和可转移性。
摘要：Language Models (LMs) have been ubiquitously leveraged in various tasks including spoken language understanding (SLU). Spoken language requires careful understanding of speaker interactions, dialog states and speech induced multimodal behaviors to generate a meaningful representation of the conversation.In this work, we propose to dissect SLU into three representative properties:conversational(disfluency, pause, overtalk), channel(speaker-type, turn-tasks) andASR(insertion, deletion,substitution). We probe BERT based language models (BERT, RoBERTa) trained on spoken transcripts to investigate its ability to understand multifarious properties in absence of any speech cues. Empirical results indicate that LM is surprisingly good at capturing conversational properties such as pause prediction and overtalk detection from lexical tokens. On the downsides, the LM scores low on turn-tasks and ASR errors predictions. Additionally, pre-training the LM on spoken transcripts restrain its linguistic understanding. Finally,we establish the efficacy and transferability of the mentioned properties on two benchmark datasets: Switchboard Dialog Act and Disfluency datasets.

【19】 On the Noise Stability and Robustness of Adversarially Trained Networks on NVM Crossbars
标题：基于NVM Crossbar的对抗性训练网络的噪声稳定性和鲁棒性
链接：https://arxiv.org/abs/2109.09060

作者：Deboleena Roy,Chun Tao,Indranil Chakraborty,Kaushik Roy
机构：Department of Electrical and Computer Engineering, Purdue University, West Lafayette, USA
备注：9 pages, 10 figures
摘要：基于深度神经网络（DNN）的应用在过去十年中呈指数级增长。为了满足其日益增长的计算需求，已经提出了几种基于非易失性存储器（NVM）纵横制的加速器。除了提高能源效率和性能外，这些近似硬件还具有防御敌方攻击的内在鲁棒性，这是DNN的一个重要安全问题。以前的工作集中在量化普通网络的这种内在鲁棒性，即在未扰动输入上训练的DNN。然而，DNN的对抗性训练是鲁棒性的基准技术，仅仅依靠硬件的固有鲁棒性可能是不够的。在这项工作中，我们通过融合对抗性训练和基于NVM纵横制的模拟硬件提供的内在鲁棒性，探索鲁棒DNN的设计。首先，我们研究了这种网络在未受干扰输入下的噪声稳定性，并观察到对抗训练网络的内部激活具有较低的信噪比（SNR），并且比普通网络对噪声敏感。因此，由于非理想计算，它们的性能明显下降；平均精度下降2倍。另一方面，对于使用投影梯度下降（PGD）白盒攻击生成的对抗性图像，在CIFAR-10/100上进行对抗性训练的ResNet-10/20显示出5-10%的鲁棒精度增益，这是由于当攻击ε（$\ε{attack}$，输入扰动度）发生时，潜在的NVM交叉条带来的大于对抗训练的ε（$\epsilon_{train}$）。我们的结果表明，在模拟硬件上实现对抗训练网络需要在硬件非理想性和$\epsilon{train}$之间进行仔细校准，以实现最佳鲁棒性和性能。
摘要：Applications based on Deep Neural Networks (DNNs) have grown exponentially in the past decade. To match their increasing computational needs, several Non-Volatile Memory (NVM) crossbar-based accelerators have been proposed. Apart from improved energy efficiency and performance, these approximate hardware also possess intrinsic robustness for defense against Adversarial Attacks, which is an important security concern for DNNs. Prior works have focused on quantifying this intrinsic robustness for vanilla networks, that is DNNs trained on unperturbed inputs. However, adversarial training of DNNs is the benchmark technique for robustness, and sole reliance on intrinsic robustness of the hardware may not be sufficient. In this work, we explore the design of robust DNNs through the amalgamation of adversarial training and the intrinsic robustness offered by NVM crossbar-based analog hardware. First, we study the noise stability of such networks on unperturbed inputs and observe that internal activations of adversarially trained networks have lower Signal-to-Noise Ratio (SNR), and are sensitive to noise than vanilla networks. As a result, they suffer significantly higher performance degradation due to the non-ideal computations; on an average 2x accuracy drop. On the other hand, for adversarial images generated using Projected-Gradient-Descent (PGD) White-Box attacks, ResNet-10/20 adversarially trained on CIFAR-10/100 display a 5-10% gain in robust accuracy due to the underlying NVM crossbar when the attack epsilon ($\epsilon_{attack}$, the degree of input perturbations) is greater than the epsilon of the adversarial training ($\epsilon_{train}$). Our results indicate that implementing adversarially trained networks on analog hardware requires careful calibration between hardware non-idealities and $\epsilon_{train}$ to achieve optimum robustness and performance.

【20】 Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Model Training
标题：反神经元水印：保护个人数据免受未经授权的神经模型训练
链接：https://arxiv.org/abs/2109.09023

作者：Zihang Zou,Boqing Gong,Liqiang Wang
机构：University of Central Florida, Google Research
摘要：在本文中，我们提出了一个新出现的个人数据保护问题，即用户个人数据（如图像）可能被不适当地利用来训练未经授权的深层神经网络模型。为了解决这个问题，我们在高级机器学习环境中重新研究了传统的水印。通过使用专门的线性颜色变换将水印签名嵌入到用户图像中，如果训练数据中包含水印图像，则神经模型将带有该签名。然后，第三方验证器可以通过从神经模型推断水印签名来验证潜在的未授权使用。我们进一步探索了水印和签名空间的期望特性，以进行令人信服的验证。通过大量的实验，我们的经验表明，线性颜色变换是有效的保护用户的个人图像为各种现实设置。据我们所知，这是第一个在神经网络训练中保护用户个人数据不被未经授权使用的工作。
摘要：In this paper, we raise up an emerging personal data protection problem where user personal data (e.g. images) could be inappropriately exploited to train deep neural network models without authorization. To solve this problem, we revisit traditional watermarking in advanced machine learning settings. By embedding a watermarking signature using specialized linear color transformation to user images, neural models will be imprinted with such a signature if training data include watermarked images. Then, a third-party verifier can verify potential unauthorized usage by inferring the watermark signature from neural models. We further explore the desired properties of watermarking and signature space for convincing verification. Through extensive experiments, we show empirically that linear color transformation is effective in protecting user's personal images for various realistic settings. To the best of our knowledge, this is the first work to protect users' personal data from unauthorized usage in neural network training.

【21】 A Machine Learning Pipeline to Examine Political Bias with Congressional Speeches
标题：一种通过国会演讲检验政治偏见的机器学习管道
链接：https://arxiv.org/abs/2109.09014

作者：Prasad hajare,Sadia Kamal,Siddharth Krishnan,Arunkumar Bagavathi
机构：Department of Computer Science, Oklahoma State University, ‡UNC Charlotte
摘要：由于数据的异质性、高维性、多模式性和规模，社交媒体中政治偏见建模的计算方法涉及若干挑战。社交媒体中的政治偏见已经从多个角度进行了研究，如媒体偏见、政治意识形态、回声室和使用机器学习管道的争议。当前的大多数方法严重依赖于人工标记的地面真相数据来进行潜在的政治偏见预测任务。这些方法的局限性包括人类密集型标签、仅与特定问题相关的标签，以及无法确定社交媒体对话的近期偏见状态。在这项工作中，我们解决了这些问题，并给出了机器学习方法来研究两个意识形态不同的社交媒体论坛中的政治偏见：Gab和Twitter，而不提供人类注释数据。我们提出的方法利用从美国国会政治演讲中收集的成绩单来标记数据，并在Twitter和Gab数据中分别达到70.5%和65.1%的最高准确率来预测政治偏见。我们还提出了一种机器学习方法，结合cascades和文本的特征，预测cascade的政治偏见，准确率约为85%。
摘要：Computational methods to model political bias in social media involve several challenges due to heterogeneity, high-dimensional, multiple modalities, and the scale of the data. Political bias in social media has been studied in multiple viewpoints like media bias, political ideology, echo chambers, and controversies using machine learning pipelines. Most of the current methods rely heavily on the manually-labeled ground-truth data for the underlying political bias prediction tasks. Limitations of such methods include human-intensive labeling, labels related to only a specific problem, and the inability to determine the near future bias state of a social media conversation. In this work, we address such problems and give machine learning approaches to study political bias in two ideologically diverse social media forums: Gab and Twitter without the availability of human-annotated data. Our proposed methods exploit the use of transcripts collected from political speeches in US congress to label the data and achieve the highest accuracy of 70.5% and 65.1% in Twitter and Gab data respectively to predict political bias. We also present a machine learning approach that combines features from cascades and text to forecast cascade's political bias with an accuracy of about 85%.

【22】 AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
标题：AutoInit：神经网络的解析保信号权值初始化
链接：https://arxiv.org/abs/2109.08958

作者：Garrett Bingham,Risto Miikkulainen
机构： The University of Texas at Austin, Cognizant AI Labs
备注：15 pages, 9 figures, 1 table
摘要：神经网络需要仔细的权值初始化，以防止信号爆炸或消失。现有的初始化方案在特定情况下通过假设网络具有特定的激活功能或拓扑来解决此问题。很难推导出这样的权重初始化策略，因此现代体系结构经常使用这些相同的初始化方案，即使它们的假设不成立。本文介绍了AutoInit，一种自动适应不同神经网络结构的权值初始化算法。通过分析跟踪信号在网络中传播时的均值和方差，AutoInit能够适当地调整每层的权重，以避免信号爆炸或消失。实验表明，AutoInit在一系列激活函数、丢失、权重衰减、学习速率和规范化器设置中提高了各种卷积和残差网络的性能。此外，在神经架构搜索和激活函数元学习中，AutoInit自动计算数千个独特架构和数百个独特激活函数的专门权重初始化策略，并提高视觉、语言、表格、多任务和转移学习场景中的性能。因此，AutoInit可以作为一种自动配置工具，使新神经网络架构的设计更加健壮。AutoInit包提供了一个围绕现有TensorFlow模型的包装器，可在https://github.com/cognizant-ai-labs/autoinit.
摘要：Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm that automatically adapts to different neural network architectures. By analytically tracking the mean and variance of signals as they propagate through the network, AutoInit is able to appropriately scale the weights at each layer to avoid exploding or vanishing signals. Experiments demonstrate that AutoInit improves performance of various convolutional and residual networks across a range of activation function, dropout, weight decay, learning rate, and normalizer settings. Further, in neural architecture search and activation function meta-learning, AutoInit automatically calculates specialized weight initialization strategies for thousands of unique architectures and hundreds of unique activation functions, and improves performance in vision, language, tabular, multi-task, and transfer learning scenarios. AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around existing TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit.

【23】 Text Detoxification using Large Pre-trained Neural Models
标题：基于大型预训练神经模型的文本去毒
链接：https://arxiv.org/abs/2109.08914

作者：David Dale,Anton Voronov,Daryna Dementieva,Varvara Logacheva,Olga Kozlova,Nikita Semenov,Alexander Panchenko
机构：‡Skolkovo Institute of Science and Technology, Moscow, Russia, †Mobile TeleSystems (MTS), Moscow, Russia
备注：Submitted to the EMNLP 2021 conference
摘要：我们提出了两种新的无监督方法来消除文本中的毒性。我们的第一种方法结合了两个最新的想法：（1）使用小样式条件语言模型指导生成过程；（2）使用释义模型执行样式转换。我们使用一个由经过风格训练的语言模型指导的表现良好的释义者来保留文本内容并消除毒性。我们的第二种方法是用非冒犯性同义词替换有毒词。我们使该方法更加灵活，允许BERT用可变数量的单词替换掩码标记。最后，我们提出了第一个关于毒性去除任务的风格转移模型的大规模比较研究。我们将我们的模型与许多风格转换方法进行比较。使用无监督风格转移度量组合，以无参考的方式对模型进行评估。我们建议的两种方法都会产生新的SOTA结果。
摘要：We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first large-scale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.

【24】 Learning to Regrasp by Learning to Place
标题：在学会摆放中学会摆放
链接：https://arxiv.org/abs/2109.08817

作者：Shuo Cheng,Kaichun Mo,Lin Shao
机构：University of California, San Diego, Stanford University
备注：Accepted to Conference on Robot Learning (CoRL) 2021
摘要：在本文中，我们探讨了机器人是否可以学习重新抓取一组不同的物体，以实现各种期望的抓取姿势。每当机器人当前的抓取姿势无法执行所需的操纵任务时，就需要重新抓取。赋予机器人这种能力在制造业或家庭服务业等许多领域都有应用。然而，由于日常物体中几何的巨大多样性以及状态和动作空间的高维性，这是一项具有挑战性的任务。在本文中，我们提出了一个机器人系统，该系统将对象和支撑环境的部分点云作为输入，并输出一系列拾取和放置操作，以将初始对象抓取姿势转换为所需的对象抓取姿势。关键技术包括一个神经稳定位置预测器和一个通过利用和改变周围环境的基于regrasp图的解决方案。我们引入了一个新的和富有挑战性的合成数据集来学习和评估所提出的方法。在这个数据集中，我们表明我们的系统能够实现73.3%的重新缩放不同对象的成功率。
摘要：In this paper, we explore whether a robot can learn to regrasp a diverse set of objects to achieve various desired grasp poses. Regrasping is needed whenever a robot's current grasp pose fails to perform desired manipulation tasks. Endowing robots with such an ability has applications in many domains such as manufacturing or domestic services. Yet, it is a challenging task due to the large diversity of geometry in everyday objects and the high dimensionality of the state and action space. In this paper, we propose a system for robots to take partial point clouds of an object and the supporting environment as inputs and output a sequence of pick-and-place operations to transform an initial object grasp pose to the desired object grasp poses. The key technique includes a neural stable placement predictor and a regrasp graph based solution through leveraging and changing the surrounding environment. We introduce a new and challenging synthetic dataset for learning and evaluating the proposed approach. In this dataset, we show that our system is able to achieve 73.3% success rate of regrasping diverse objects.

【25】 Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making
标题：学会公平：公平决策的结果主义方法
链接：https://arxiv.org/abs/2109.08792

作者：Alex Chohlas-Wood,Madison Coots,Emma Brunskill,Sharad Goel
机构： screen-ing decisions are based on the estimated risk of adverse 1Department of Management Science & Engineering, CA 2Department of Com-puter Science, Stanford University, CA 3KennedySchool, Harvard University
摘要：在设计公平机器学习系统的主导范式中，人们努力确保模型预测满足各种公平标准，如种族、性别和其他受法律保护的特征的错误率均等。然而，这种方法通常会使预测与它们最终影响的下游结果脱节，因此可能导致意外伤害。在这里，我们提出了一个直接预测行动后果的替代性公平框架。利益相关者首先指定对算法知情决策过程可能结果的偏好。例如，贷款人可能更愿意向最有可能偿还贷款的人提供信贷，同时也更愿意在各个街区提供类似的贷款利率。然后搜索决策策略的空间以最大化指定的效用。我们开发并描述了一种方法，用于从大量表达效用函数的数据中有效地学习这些最优策略，从而促进更全面的公平决策方法。
摘要：In the dominant paradigm for designing equitable machine learning systems, one works to ensure that model predictions satisfy various fairness criteria, such as parity in error rates across race, gender, and other legally protected traits. That approach, however, typically divorces predictions from the downstream outcomes they ultimately affect, and, as a result, can induce unexpected harms. Here we present an alternative framework for fairness that directly anticipates the consequences of actions. Stakeholders first specify preferences over the possible outcomes of an algorithmically informed decision-making process. For example, lenders may prefer extending credit to those most likely to repay a loan, while also preferring similar lending rates across neighborhoods. One then searches the space of decision policies to maximize the specified utility. We develop and describe a method for efficiently learning these optimal policies from data for a large family of expressive utility functions, facilitating a more holistic approach to equitable decision-making.

【26】 Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy
标题：用EWASH对源代码文件进行远程建模：按语法层次扩展窗口访问
链接：https://arxiv.org/abs/2109.08780

作者：Colin B. Clement,Shuai Lu,Xiaoyu Liu,Michele Tufano,Dawn Drain,Nan Duan,Neel Sundaresan,Alexey Svyatkovskiy
机构：Microsoft Cloud and AI, Microsoft Research
备注：EMNLP 2021 camera ready
摘要：使用transformers的统计语言建模和翻译在程序理解和生成任务中发现了许多成功的应用，为现代软件开发环境中的工具设置了高基准。然而，这些神经模型的有限上下文窗口意味着它们将无法利用任何给定任务的大文件和包的整个相关上下文。虽然有很多努力来扩展上下文窗口，但我们引入了一种独立于体系结构的方法来利用源代码的语法层次结构，将整个文件级上下文合并到一个固定长度的窗口中。使用每个源文件的具体语法树，我们提取语法层次结构，并通过有选择地从视图中删除给定任务的更具体、不太相关的作用域，将它们集成到上下文窗口中。我们在代码生成任务和Python编程语言中自然语言和源代码的联合翻译方面对这种方法进行了评估，在CodeXGLUE基准测试中实现了Python代码完成和摘要的最新技术。我们还为用户体验驱动的任务引入了新的CodeXGLUE基准：规范化文本的代码完成、基于文件级上下文的方法体完成/代码摘要。
摘要：Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context window of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any given task. While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window. Using concrete syntax trees of each source file we extract syntactic hierarchies and integrate them into context window by selectively removing from view more specific, less relevant scopes for a given task. We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language, achieving a new state-of-the-art in code completion and summarization for Python in the CodeXGLUE benchmark. We also introduce new CodeXGLUE benchmarks for user-experience-motivated tasks: code completion with normalized literals, method body completion/code summarization conditioned on file-level context.

【27】 Barely Biased Learning for Gaussian Process Regression
标题：高斯过程回归的几乎无偏学习
链接：https://arxiv.org/abs/2109.09417

作者：David R. Burt,Artem Artemev,Mark van der Wilk
机构：Department of Engineering, University of Cambridge, Cambridge, UK, Department of Computing, Imperial College London, London, UK
摘要：最近在可伸缩近似高斯过程回归中的工作讨论了在估计对数边际似然时偏差-方差计算的权衡。我们提出了一种方法，在估计对数边际似然时自适应地选择要使用的计算量，从而保证目标函数的偏差很小。虽然原则上很简单，但我们目前实现的方法在计算上与现有的近似方法不具有竞争力。
摘要：Recent work in scalable approximate Gaussian process regression has discussed a bias-variance-computation trade-off when estimating the log marginal likelihood. We suggest a method that adaptively selects the amount of computation to use when estimating the log marginal likelihood so that the bias of the objective function is guaranteed to be small. While simple in principle, our current implementation of the method is not competitive computationally with existing approximations.

【28】 Performance and accuracy assessments of an incompressible fluid solver coupled with a deep Convolutional Neural Network
标题：与深度卷积神经网络耦合的不可压缩流体解算器的性能和精度评估
链接：https://arxiv.org/abs/2109.09363

作者：Ekhi Ajuria Illarramendi,Michaël Bauerheim,Bénédicte Cuenot
机构：ISAE-SUPAERO CERFACS, Université de Toulouse, Toulouse (France), Michael Bauerheim
摘要：泊松方程的求解通常是不可压缩流体求解器计算量最大的步骤之一。最近，深度学习，特别是卷积神经网络（CNN）被引入求解该方程，导致推理时间显著减少，但代价是无法保证解的准确性。这一缺点可能导致不准确和潜在的不稳定模拟。例如，当改变网络结构时，由于在不同的错误级别上进行评估，因此也不可能对CNN的加速进行公平评估。为了避免这个问题，开发了一种混合策略，该策略将CNN与传统的迭代解算器耦合，以确保用户定义的精度水平。CNN混合方法在两种流动情况下进行了测试，包括有障碍物和无障碍物的变密度羽流，证明了显著的泛化能力，确保了模拟的准确性和稳定性。进一步研究了使用几种网络结构的预测误差分布。结果表明，混合策略的阈值定义为速度场的平均散度，确保了基于CNN的混合计算策略的一致物理行为。该策略允许在不同网络架构的相同精度水平上对CNN性能进行系统评估。特别是，在网络体系结构中加入多尺度的重要性得到了证明，因为与前馈CNN体系结构相比，这些网络可以提供比传统迭代解算器更快的解10-25，因此提高了精度和推理性能。
摘要：The resolution of the Poisson equation is usually one of the most computationally intensive steps for incompressible fluid solvers. Lately, Deep Learning, and especially Convolutional Neural Networks (CNN), has been introduced to solve this equation, leading to significant inference time reduction at the cost of a lack of guarantee on the accuracy of the solution. This drawback might lead to inaccuracies and potentially unstable simulations. It also makes impossible a fair assessment of the CNN speedup, for instance, when changing the network architecture, since evaluated at different error levels. To circumvent this issue, a hybrid strategy is developed, which couples a CNN with a traditional iterative solver to ensure a user-defined accuracy level. The CNN hybrid method is tested on two flow cases, consisting of a variable-density plume with and without obstacles, demostrating remarkable generalization capabilities, ensuring both the accuracy and stability of the simulations. The error distribution of the predictions using several network architectures is further investigated. Results show that the threshold of the hybrid strategy defined as the mean divergence of the velocity field is ensuring a consistent physical behavior of the CNN-based hybrid computational strategy. This strategy allows a systematic evaluation of the CNN performance at the same accuracy level for various network architectures. In particular, the importance of incorporating multiple scales in the network architecture is demonstrated, since improving both the accuracy and the inference performance compared with feedforward CNN architectures, as these networks can provide solutions 1 10-25 faster than traditional iterative solvers.

【29】 Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
标题：超宽神经网络的变形半圆律与非线性随机矩阵的聚集性
链接：https://arxiv.org/abs/2109.09304

作者：Zhichao Wang,Yizhe Zhu
备注：46 pages, 5 figures
摘要：本文研究了由$f（X）=\frac{1}{\sqrt{d_1}}}\boldsymbol{a}{a}\top\sigma\left（WX\right）$给出的两层全连通神经网络，其中$X\In\mathbb{R}{d{0\times n}是一个确定性数据矩阵，$W\In\mathbb{R}{d}{d}{1\times d{u}$0}和boldsymbol{a\Gaussian}{R}是一个随机权值，$\sigma$是一个非线性激活函数。我们得到了与$f（X）$相关的两个核矩阵的极限谱分布：经验共轭核（CK）和神经切线核（NTK），超出了线性宽度区域（$d_1\asymp n$）。在超宽区域$d_1/n\to\infty$下，在适当假设$X$和$\sigma$的情况下，出现了变形的半圆定律。这种极限律首先被证明适用于具有相关性的一般中心样本协方差矩阵，然后被指定用于我们的神经网络模型。我们还证明了谱范数中经验CK和NTK在其极限核附近的非渐近集中，以及它们的最小特征值的下界。作为一个应用，我们验证了随机特征回归在超宽度限制下与极限核回归具有相同的渐近性能。通过相应的核回归计算随机特征回归的极限训练和测试误差。我们还提供了一个适用于具有随机权重和Lipschitz激活函数的神经网络的非线性Hanson-Wright不等式。
摘要：In this paper, we study the two-layer fully connected neural network given by $f(X)=\frac{1}{\sqrt{d_1}}\boldsymbol{a}^\top\sigma\left(WX\right)$, where $X\in\mathbb{R}^{d_0\times n}$ is a deterministic data matrix, $W\in\mathbb{R}^{d_1\times d_0}$ and $\boldsymbol{a}\in\mathbb{R}^{d_1}$ are random Gaussian weights, and $\sigma$ is a nonlinear activation function. We obtain the limiting spectral distributions of two kernel matrices related to $f(X)$: the empirical conjugate kernel (CK) and neural tangent kernel (NTK), beyond the linear-width regime ($d_1\asymp n$). Under the ultra-width regime $d_1/n\to\infty$, with proper assumptions on $X$ and $\sigma$, a deformed semicircle law appears. Such limiting law is first proved for general centered sample covariance matrices with correlation and then specified for our neural network model. We also prove non-asymptotic concentrations of empirical CK and NTK around their limiting kernel in the spectral norm, and lower bounds on their smallest eigenvalues. As an application, we verify the random feature regression achieves the same asymptotic performance as its limiting kernel regression in ultra-width limit. The limiting training and test errors for random feature regression are calculated by corresponding kernel regression. We also provide a nonlinear Hanson-Wright inequality suitable for neural networks with random weights and Lipschitz activation functions.

【30】 Locally-symplectic neural networks for learning volume-preserving dynamics
标题：学习保体动力学的局部辛神经网络
链接：https://arxiv.org/abs/2109.09151

作者：Jānis Bajārs
机构： Mathematics and OptometryUniversity of LatviaJelgavas Street 3Riga
摘要：We propose locally-symplectic neural networks LocSympNets for learning volume-preserving dynamics. The construction of LocSympNets stems from the theorem of local Hamiltonian description of the vector field of a volume-preserving dynamical system and the splitting methods based on symplectic integrators. Modified gradient modules of recently proposed symplecticity-preserving neural networks SympNets are used to construct locally-symplectic modules, which composition results in volume-preserving neural networks. LocSympNets are studied numerically considering linear and nonlinear dynamics, i.e., semi-discretized advection equation and Euler equations of the motion of a free rigid body, respectively. LocSympNets are able to learn linear and nonlinear dynamics to high degree of accuracy. When learning a single trajectory of the rigid body dynamics LocSympNets are able to learn both invariants of the system with absolute relative errors below 1% in long-time predictions and produce qualitatively good short-time predictions, when the learning of the whole system from randomly sampled data is considered.

【31】 Model-Based Approach for Measuring the Fairness in ASR
标题：基于模型的ASR公平性度量方法
链接：https://arxiv.org/abs/2109.09061

作者：Zhe Liu,Irina-Elena Veliche,Fuchun Peng
机构：Facebook AI, Menlo Park, CA, USA
摘要：The issue of fairness arises when the automatic speech recognition (ASR) systems do not perform equally well for all subgroups of the population. In any fairness measurement studies for ASR, the open questions of how to control the nuisance factors, how to handle unobserved heterogeneity across speakers, and how to trace the source of any word error rate (WER) gap among different subgroups are especially important - if not appropriately accounted for, incorrect conclusions will be drawn. In this paper, we introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest. Particularly, the presented method can effectively address the three problems raised above and is very flexible to use in practical disparity analyses. We demonstrate the validity of proposed model-based approach on both synthetic and real-world speech data.

【32】 KNN Learning Techniques for Proportional Myocontrol in Prosthetics
标题：KNN学习技术在假肢比例肌控中的应用
链接：https://arxiv.org/abs/2109.08917

作者：Tim Sziburis,Markus Nowak,Davide Brunelli
机构： Brunelli is with the Department of Industrial Engineering, University of Trento
摘要：This work has been conducted in the context of pattern-recognition-based control for electromyographic prostheses. It presents a k-nearest neighbour (kNN) classification technique for gesture recognition, extended by a proportionality scheme. The methods proposed are practically implemented and validated. Datasets are captured by means of a state-of-the-art 8-channel electromyography (EMG) armband positioned on the forearm. Based on this data, the influence of kNN's parameters is analyzed in pilot experiments. Moreover, the effect of proportionality scaling and rest thresholding schemes is investigated. A randomized, double-blind user study is conducted to compare the implemented method with the state-of-research algorithm Ridge Regression with Random Fourier Features (RR-RFF) for different levels of gesture exertion. The results from these experiments show a statistically significant improvement in favour of the kNN-based algorithm.

【33】 Underwater Image Enhancement Using Convolutional Neural Network
标题：基于卷积神经网络的水下图像增强
链接：https://arxiv.org/abs/2109.08916

作者：Anushka Yadav,Mayank Upadhyay,Ghanapriya Singh
机构： National Institute of Technology, Department of Electronics Engineering, Uttarakhand
摘要：This work proposes a method for underwater image enhancement using the principle of histogram equalization. Since underwater images have a global strong dominant colour, their colourfulness and contrast are often degraded. Before applying the histogram equalisation technique on the image, the image is converted from coloured image to a gray scale image for further operations. Histogram equalization is a technique for adjusting image intensities to enhance contrast. The colours of the image are retained using a convolutional neural network model which is trained by the datasets of underwater images to give better results.

【34】 Analyzing the Habitable Zones of Circumbinary Planets Using Machine Learning
标题：基于机器学习的环行双星宜居区分析
链接：https://arxiv.org/abs/2109.08735

作者：Zhihui Kong,Jonathan H. Jiang,Remo Burn,Kristen A. Fahy,Zonghong Zhu
机构：. Department of Astronomy, Beijing Normal University, Beijing, China, . Jet Propulsion Laboratory, California Institute of Technology, Pasadena, USA, . Max Planck Institute for Astronomy, Königstuhl , Heidelberg, Germany
备注：arXiv admin note: text overlap with arXiv:2101.02316
摘要：Exoplanet detection in the past decade by efforts including NASA's Kepler and TESS missions has discovered many worlds that differ substantially from planets in our own Solar System, including more than 150 exoplanets orbiting binary or multi-star systems. This not only broadens our understanding of the diversity of exoplanets, but also promotes our study of exoplanets in the complex binary systems and provides motivation to explore their habitability. In this study, we investigate the Habitable Zones of circumbinary planets based on planetary trajectory and dynamically informed habitable zones. Our results indicate that the mass ratio and orbital eccentricity of binary stars are important factors affecting the orbital stability and habitability of planetary systems. Moreover, planetary trajectory and dynamically informed habitable zones divide planetary habitability into three categories: habitable, part-habitable and uninhabitable. Therefore, we train a machine learning model to quickly and efficiently classify these planetary systems.

【35】 Proteome-informed machine learning studies of cocaine addiction
标题：基于蛋白质组信息的可卡因成瘾机器学习研究
链接：https://arxiv.org/abs/2109.08718

作者：Kaifu Gao,Dong Chen,Alfred J Robison,Guo-Wei Wei
机构： Department of Mathematics, Michigan State University, MI , USA., Department of Physiology, Department of Electrical and Computer Engineering, Department of Biochemistry and Molecular Biology
摘要：Cocaine addiction accounts for a large portion of substance use disorders and threatens millions of lives worldwide. There is an urgent need to come up with efficient anti-cocaine addiction drugs. Unfortunately, no medications have been approved by the Food and Drug Administration (FDA), despite the extensive effort in the past few decades. The main challenge is the intricate molecular mechanisms of cocaine addiction, involving synergistic interactions among proteins upstream and downstream of dopamine transporter (DAT) functions impacted by cocaine. However, traditional in vivo or in vitro experiments can not address the roles of so many proteins, highlighting the need for innovative strategies in the field. We propose a proteome-informed machine learning/deep learning (ML/DL) platform to discover nearly optimal anti-cocaine addiction lead compounds. We construct and analyze proteomic protein-protein interaction (PPI) networks for cocaine dependence to identify 141 involved drug targets and represent over 60,000 associated drug candidates or experimental drugs in the latent space using an autoencoder (EA) model trained from over 104 million molecules. We build 32 ML models for cross-target analysis of these drug candidates for side effects and repurposing potential. We further screen the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of these candidates. Our platform reveals that essentially all of the existing drug candidates, including dozens of experimental drugs, fail to pass our cross-target and ADMET screenings. Nonetheless, we have identified two nearly optimal leads for further optimization.

【36】 Experimental Evaluation of Computational Complexity for Different Neural Network Equalizers in Optical Communications
标题：光通信中不同神经网络均衡器计算复杂度的实验评估
链接：https://arxiv.org/abs/2109.08711

作者：Pedro J. Freire,Yevhenii Osadchuk,Antonio Napoli,Bernhard Spinnler,Wolfgang Schairer,Nelson Costa,Jaroslaw E. Prilepsky,Sergei K. Turitsyn
机构： Turitsyn( 1)( 1) Aston University
备注：ORAL presentation at the Asia Communications and Photonics Conference (ACP 2021)
摘要：Addressing the neural network-based optical channel equalizers, we quantify the trade-off between their performance and complexity by carrying out the comparative analysis of several neural network architectures, presenting the results for TWC and SSMF set-ups.

其他(23篇)

【1】 Scaling TensorFlow to 300 million predictions per second
标题：将TensorFlow扩展到每秒3亿次预测
链接：https://arxiv.org/abs/2109.09541

作者：Jan Hartman,Davorin Kopič
摘要：We present the process of transitioning machine learning models to the TensorFlow framework at a large scale in an online advertising ecosystem. In this talk we address the key challenges we faced and describe how we successfully tackled them; notably, implementing the models in TF and serving them efficiently with low latency using various optimization techniques.

【2】 GhostShiftAddNet: More Features from Energy-Efficient Operations
标题：GhostShiftAddNet：节能操作的更多功能
链接：https://arxiv.org/abs/2109.09495

作者：Jia Bi,Jonathon Hare,Geoff V. Merrett
机构：Electronics and Computer Science, University of Southampton
摘要：Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with fewer redundant features. We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations. The bottleneck uses an appropriate number of bit-shift filters to process intrinsic feature maps, then applies a series of transformations that consist of bit-wise shifts with addition operations to generate more feature maps that fully learn to capture information underlying intrinsic features. We schedule the number of bit-shift and addition operations for different hardware platforms. We conduct extensive experiments and ablation studies with desktop and embedded (Jetson Nano) devices for implementation and measurements. We demonstrate the proposed GhostSA block can replace bottleneck blocks in the backbone of state-of-the-art networks architectures and gives improved performance on image classification benchmarks. Further, our GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3x) than GhostNet. When compared to GhostNet, inference latency on the Jetson Nano is improved by 1.3x and 2x on the GPU and CPU respectively.

【3】 Towards Ubiquitous Indoor Positioning: Comparing Systems across Heterogeneous Datasets
标题：走向无处不在的室内定位：跨异构数据集比较系统
链接：https://arxiv.org/abs/2109.09436

作者：Joaquín Torres-Sospedra,Ivo Silva,Lucie Klus,Darwin Quezada-Gaibor,Antonino Crivello,Paolo Barsocchi,Cristiano Pendão,Elena Simona Lohan,Jari Nurmi,Adriano Moreira
机构：∗UBIK Geospatial Solutions S.L., Castellon, Spain, † Algoritmi Research Center, University of Minho, Guimar˜aes, Portugal, Tampere University, Tampere, Finland, §Institute of New Imaging Technologies, Universitat Jaume I, Castellon, Spain
备注：to appear in 2021 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 29 Nov. - 2 Dec. 2021, Lloret de Mar, Spain
摘要：The evaluation of Indoor Positioning Systems (IPS) mostly relies on local deployments in the researchers' or partners' facilities. The complexity of preparing comprehensive experiments, collecting data, and considering multiple scenarios usually limits the evaluation area and, therefore, the assessment of the proposed systems. The requirements and features of controlled experiments cannot be generalized since the use of the same sensors or anchors density cannot be guaranteed. The dawn of datasets is pushing IPS evaluation to a similar level as machine-learning models, where new proposals are evaluated over many heterogeneous datasets. This paper proposes a way to evaluate IPSs in multiple scenarios, that is validated with three use cases. The results prove that the proposed aggregation of the evaluation metric values is a useful tool for high-level comparison of IPSs.

【4】 Background-Foreground Segmentation for Interior Sensing in Automotive Industry
标题：汽车工业车内传感中的背景-前景分割
链接：https://arxiv.org/abs/2109.09410

作者：Claudia Drygala,Matthias Rottmann,Hanno Gottschalk,Klaus Friedrichs,Thomas Kurbiel
机构：University of Wuppertal, School of Mathematics and Natural Sciences, IMACM & IZMD, Aptiv Services Deutschland GmbH, Wuppertal, Germany
摘要：To ensure safety in automated driving, the correct perception of the situation inside the car is as important as its environment. Thus, seat occupancy detection and classification of detected instances play an important role in interior sensing. By the knowledge of the seat occupancy status, it is possible to, e.g., automate the airbag deployment control. Furthermore, the presence of a driver, which is necessary for partially automated driving cars at the automation levels two to four can be verified. In this work, we compare different statistical methods from the field of image segmentation to approach the problem of background-foreground segmentation in camera based interior sensing. In the recent years, several methods based on different techniques have been developed and applied to images or videos from different applications. The peculiarity of the given scenarios of interior sensing is, that the foreground instances and the background both contain static as well as dynamic elements. In data considered in this work, even the camera position is not completely fixed. We review and benchmark three different methods ranging, i.e., Gaussian Mixture Models (GMM), Morphological Snakes and a deep neural network, namely a Mask R-CNN. In particular, the limitations of the classical methods, GMM and Morphological Snakes, for interior sensing are shown. Furthermore, it turns, that it is possible to overcome these limitations by deep learning, e.g.\ using a Mask R-CNN. Although only a small amount of ground truth data was available for training, we enabled the Mask R-CNN to produce high quality background-foreground masks via transfer learning. Moreover, we demonstrate that certain augmentation as well as pre- and post-processing methods further enhance the performance of the investigated methods.

【5】 UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying Multilingual Check-worthy Claims
标题：CheckThat！2021：减少文化差异以识别多语言的值得检查的声明
链接：https://arxiv.org/abs/2109.09232

作者：Ipek Baris Schlicht,Angel Felipe Magnossão de Paula,Paolo Rosso
机构：Universitat Politècnica de València, Spain
备注：None
摘要：Identifying check-worthy claims is often the first step of automated fact-checking systems. Tackling this task in a multilingual setting has been understudied. Encoding inputs with multilingual text representations could be one approach to solve the multilingual check-worthiness detection. However, this approach could suffer if cultural bias exists within the communities on determining what is check-worthy.In this paper, we propose a language identification task as an auxiliary task to mitigate unintended bias.With this purpose, we experiment joint training by using the datasets from CLEF-2021 CheckThat!, that contain tweets in English, Arabic, Bulgarian, Spanish and Turkish. Our results show that joint training of language identification and check-worthy claim detection tasks can provide performance gains for some of the selected languages.

【6】 ARCA23K: An audio dataset for investigating open-set label noise
标题：ARCA23K：一个用于研究开集标签噪声的音频数据集
链接：https://arxiv.org/abs/2109.09227

作者：Turab Iqbal,Yin Cao,Andrew Bailey,Mark D. Plumbley,Wenwu Wang
机构：Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, UK
备注：Accepted to the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)
摘要：The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often prevalent in such datasets requires further investigation. This paper introduces ARCA23K, an Automatically Retrieved and Curated Audio dataset comprised of over 23000 labelled Freesound clips. Unlike past datasets such as FSDKaggle2018 and FSDnoisy18K, ARCA23K facilitates the study of label noise in a more controlled manner. We describe the entire process of creating the dataset such that it is fully reproducible, meaning researchers can extend our work with little effort. We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise. Experiments are carried out in which we study the impact of label noise in terms of classification performance and representation learning.

【7】 Multiscale Manifold Warping
标题：多尺度流形翘曲
链接：https://arxiv.org/abs/2109.09222

作者：Sridhar Mahadevan,Anup Rao,Georgios Theocharous,Jennifer Healey
机构：Adobe Research, Park Avenue, San Jose, CA
备注：18 pages
摘要：Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination. Dynamic Time Warping (DTW) is a popular alignment method, but can fail on high-dimensional real-world data where the dimensions of aligned sequences are often unequal. In this paper, we show that exploiting the multiscale manifold latent structure of real-world data can yield improved alignment. We introduce a novel framework called Warping on Wavelets (WOW) that integrates DTW with a a multi-scale manifold learning framework called Diffusion Wavelets. We present a theoretical analysis of the WOW family of algorithms and show that it outperforms previous state of the art methods, such as canonical time warping (CTW) and manifold warping, on several real-world datasets.

【8】 JEM++: Improved Techniques for Training JEM
标题：JEM++：改进的JEM训练技术
链接：https://arxiv.org/abs/2109.09032

作者：Xiulong Yang,Shihao Ji
机构：Department of Computer Science, Georgia State University
备注：Published as a conference paper at ICCV 2021
摘要：Joint Energy-based Model (JEM) is a recently proposed hybrid model that retains strong discriminative power of modern CNN classifiers, while generating samples rivaling the quality of GAN-based approaches. In this paper, we propose a variety of new training procedures and architecture features to improve JEM's accuracy, training stability, and speed altogether. 1) We propose a proximal SGLD to generate samples in the proximity of samples from the previous step, which improves the stability. 2) We further treat the approximate maximum likelihood learning of EBM as a multi-step differential game, and extend the YOPO framework to cut out redundant calculations during backpropagation, which accelerates the training substantially. 3) Rather than initializing SGLD chain from random noise, we introduce a new informative initialization that samples from a distribution estimated from training data. 4) This informative initialization allows us to enable batch normalization in JEM, which further releases the power of modern CNN architectures for hybrid modeling. Code: https://github.com/sndnyang/JEMPP

【9】 G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency
标题：G-CoS：GNN-Accelerator协同搜索以获得更高的精确度和效率
链接：https://arxiv.org/abs/2109.08983

作者：Yongan Zhang,Haoran You,Yonggan Fu,Tong Geng,Ang Li,Yingyan Lin
机构：Rice University, Houston, TX; ,Pacific Northwest National Laboratory, Richland, WA
备注：Accepted at ICCAD 2021
摘要：Graph Neural Networks (GNNs) have emerged as the state-of-the-art (SOTA) method for graph-based learning tasks. However, it still remains prohibitively challenging to inference GNNs over large graph datasets, limiting their application to large-scale real-world tasks. While end-to-end jointly optimizing GNNs and their accelerators is promising in boosting GNNs' inference efficiency and expediting the design process, it is still underexplored due to the vast and distinct design spaces of GNNs and their accelerators. In this work, we propose G-CoS, a GNN and accelerator co-search framework that can automatically search for matched GNN structures and accelerators to maximize both task accuracy and acceleration efficiency. Specifically, GCoS integrates two major enabling components: (1) a generic GNN accelerator search space which is applicable to various GNN structures and (2) a one-shot GNN and accelerator co-search algorithm that enables simultaneous and efficient search for optimal GNN structures and their matched accelerators. To the best of our knowledge, G-CoS is the first co-search framework for GNNs and their accelerators. Extensive experiments and ablation studies show that the GNNs and accelerators generated by G-CoS consistently outperform SOTA GNNs and GNN accelerators in terms of both task accuracy and hardware efficiency, while only requiring a few hours for the end-to-end generation of the best matched GNNs and their accelerators.

【10】 AI Accelerator Survey and Trends
标题：人工智能加速器现状及发展趋势
链接：https://arxiv.org/abs/2109.08957

作者：Albert Reuther,Peter Michaleas,Michael Jones,Vijay Gadepally,Siddharth Samsi,Jeremy Kepner
机构：MIT Lincoln Laboratory Supercomputing Center, Lexington, MA, USA
备注：9 pages, 2 figures, IEEE High Performance Extreme Computing Conference 2021
摘要：Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of AI accelerators and processors from past two years. This paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. This year, we also compile a list of benchmarking performance results and compute the computational efficiency with respect to peak performance.

【11】 Towards Resilient Artificial Intelligence: Survey and Research Issues
标题：迈向有弹性的人工智能：调查与研究问题
链接：https://arxiv.org/abs/2109.08904

作者：Oliver Eigner,Sebastian Eresheim,Peter Kieseberg,Lukas Daniel Klausner,Martin Pirker,Torsten Priebe,Simon Tjoa,Fiammetta Marulli,Francesco Mercaldo
机构：Institute of IT Security Research, St. Pölten University of Applied Sciences, Department of Mathematics and Physics, University of Campania “Luigi Vanvitelli”, Department of Medicine and, Health Sciences “Vincenzo Tiberio”, University of Molise
备注：None
摘要：Artificial intelligence (AI) systems are becoming critical components of today's IT landscapes. Their resilience against attacks and other environmental influences needs to be ensured just like for other IT assets. Considering the particular nature of AI, and machine learning (ML) in particular, this paper provides an overview of the emerging field of resilient AI and presents research issues the authors identify as potential future work.

【12】 Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts
标题：创造力的现代进化策略：拟合具体形象和抽象概念
链接：https://arxiv.org/abs/2109.08857

作者：Yingtao Tian,David Ha
机构：Google Brain
摘要：Evolutionary algorithms have been used in the digital art scene since the 1970s. A popular application of genetic algorithms is to optimize the procedural placement of vector graphic primitives to resemble a given painting. In recent years, deep learning-based approaches have also been proposed to generate procedural drawings, which can be optimized using gradient descent. In this work, we revisit the use of evolutionary algorithms for computational creativity. We find that modern evolution strategies (ES) algorithms, when tasked with the placement of shapes, offer large improvements in both quality and efficiency compared to traditional genetic algorithms, and even comparable to gradient-based methods. We demonstrate that ES is also well suited at optimizing the placement of shapes to fit the CLIP model, and can produce diverse, distinct geometric abstractions that are aligned with human interpretation of language. Videos and demo: https://es-clip.github.io/

【13】 SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification
标题：SpeechNAS：在大规模说话人验证中实现延迟和准确性之间的更好折衷
链接：https://arxiv.org/abs/2109.08839

作者：Wentao Zhu,Tianlong Kong,Shun Lu,Jixiang Li,Dawei Zhang,Feng Deng,Xiaorui Wang,Sen Yang,Ji Liu
机构：Kuaishou Technology
备注：8 pages, 3 figures, 3 tables. Accepted by ASRU2021
摘要：Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin. Code and trained weights are in https://github.com/wentaozhu/speechnas.git

【14】 BERT-Beta: A Proactive Probabilistic Approach to Text Moderation
标题：Bert-Beta：文本审核的一种主动概率方法
链接：https://arxiv.org/abs/2109.08805

作者：Fei Tan,Yifan Hu,Kevin Yen,Changwei Hu
机构：Yahoo Research, New York, NY, USA
备注：9 pages, EMNLP'21
摘要：Text moderation for user generated content, which helps to promote healthy interaction among users, has been widely studied and many machine learning models have been proposed. In this work, we explore an alternative perspective by augmenting reactive reviews with proactive forecasting. Specifically, we propose a new concept {\it text toxicity propensity} to characterize the extent to which a text tends to attract toxic comments. Beta regression is then introduced to do the probabilistic modeling, which is demonstrated to function well in comprehensive experiments. We also propose an explanation method to communicate the model decision clearly. Both propensity scoring and interpretation benefit text moderation in a novel manner. Finally, the proposed scaling mechanism for the linear model offers useful insights beyond this work.

【15】 An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural Engineering
标题：t-SNE算法在结构工程数据可视化中的实证评价
链接：https://arxiv.org/abs/2109.08795

作者：Parisa Hajibabaee,Farhad Pourkamali-Anaraki,Mohammad Amin Hariri-Ardebili
机构：Computer Science, University of Massachusetts, Lowell, USA, Civil Environmental and Architectural Engineering, University of Colorado, Boulder, USA
备注：This paper has been accepted for publication in IEEE International Conference on Machine Learning and Applications 2021
摘要：A fundamental task in machine learning involves visualizing high-dimensional data sets that arise in high-impact application domains. When considering the context of large imbalanced data, this problem becomes much more challenging. In this paper, the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm is used to reduce the dimensions of an earthquake engineering related data set for visualization purposes. Since imbalanced data sets greatly affect the accuracy of classifiers, we employ Synthetic Minority Oversampling Technique (SMOTE) to tackle the imbalanced nature of such data set. We present the result obtained from t-SNE and SMOTE and compare it to the basic approaches with various aspects. Considering four options and six classification algorithms, we show that using t-SNE on the imbalanced data and SMOTE on the training data set, neural network classifiers have promising results without sacrificing accuracy. Hence, we can transform the studied scientific data into a two-dimensional (2D) space, enabling the visualization of the classifier and the resulting decision surface using a 2D plot.

【16】 Back-translation for Large-Scale Multilingual Machine Translation
标题：面向大规模多语种机器翻译的反向翻译
链接：https://arxiv.org/abs/2109.08712

作者：Baohao Liao,Shahram Khadivi,Sanjika Hewavitharana
机构：Bay Inc.
摘要：This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieved the second place.

【17】 Relating Neural Text Degeneration to Exposure Bias
标题：神经文本退化与暴露偏向的关系
链接：https://arxiv.org/abs/2109.08705

作者：Ting-Rui Chiang,Yun-Nung Chen
机构：Carnegie Mellon University, National Taiwan University
备注：Accepted by BlackBoxNLP at EMNLP 2021
摘要：This work focuses on relating two mysteries in neural-based text generation: exposure bias, and text degeneration. Despite the long time since exposure bias was mentioned and the numerous studies for its remedy, to our knowledge, its impact on text generation has not yet been verified. Text degeneration is a problem that the widely-used pre-trained language model GPT-2 was recently found to suffer from (Holtzman et al., 2020). Motivated by the unknown causation of the text degeneration, in this paper we attempt to relate these two mysteries. Specifically, we first qualitatively quantitatively identify mistakes made before text degeneration occurs. Then we investigate the significance of the mistakes by inspecting the hidden states in GPT-2. Our results show that text degeneration is likely to be partly caused by exposure bias. We also study the self-reinforcing mechanism of text degeneration, explaining why the mistakes amplify. In sum, our study provides a more concrete foundation for further investigation on exposure bias and text degeneration problems.

【18】 Acoustic Echo Cancellation using Residual U-Nets
标题：基于剩余U网的声学回波抵消
链接：https://arxiv.org/abs/2109.09686

作者：J. Silva-Rodríguez,M. F. Dolz,M. Ferrer,A. Castelló,V. Naranjo,G. Piñero
机构： Universitat Politecnica de Valencia
备注：6 pages, 2 figures, submitted to the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing on October 2020
摘要：This paper presents an acoustic echo canceler based on a U-Net convolutional neural network for single-talk and double-talk scenarios. U-Net networks have previously been used in the audio processing area for source separation problems because of their ability to reproduce the finest details of audio signals, but to our knowledge, this is the first time they have been used for acoustic echo cancellation (AEC). The U-Net hyperparameters have been optimized to obtain the best AEC performance, but using a reduced number of parameters to meet a latency restriction of 40 ms. The training and testing of our model have been carried out within the framework of the 'ICASSP 2021 AEC Challenge' organized by Microsoft. We have trained the optimized U-Net model with a synthetic dataset only (S-U-Net) and with a synthetic dataset and the single-talk set of a real dataset (SR-U-Net), both datasets were released for the challenge. The S-U-Net model presented better results for double-talk scenarios, thus their inferred near-end signals from the blind testset were submitted to the challenge. Our canceler ranked 12th among 17 teams, and 5th among 10 academia teams, obtaining an overall mean opinion score of 3.57.

【19】 Accelerated Stochastic Gradient for Nonnegative Tensor Completion and Parallel Implementation
标题：非负张量补全的加速随机梯度算法及其并行实现
链接：https://arxiv.org/abs/2109.09534

作者：Ioanna Siaminou,Ioannis Marios Papagiannakos,Christos Kolomvakis,Athanasios P. Liavas
机构：School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece
备注：5 pages, 4 figures, this work was accepted and presented at EUSIPCO 2021
摘要：We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient algorithm. We experimentally test the effectiveness and the efficiency of our algorithm using both real-world and synthetic data. We develop a shared-memory implementation of our algorithm using the multi-threaded API OpenMP, which attains significant speedup. We believe that our approach is a very competitive candidate for the solution of very large nonnegative tensor completion problems.

【20】 Scalable Multi-Task Gaussian Processes with Neural Embedding of Coregionalization
标题：协同区域化神经嵌入的可伸缩多任务高斯过程
链接：https://arxiv.org/abs/2109.09261

作者：Haitao Liu,Jiaqi Ding,Xinyu Xie,Xiaomo Jiang,Yusong Zhao,Xiaofang Wang
机构：Wanga,∗, School of Energy and Power Engineering, Dalian University of Technology, China, Digital Twin Laboratory for Industrial Equipment, Dalian University of Technology, China,.
备注：29 pages, 9 figures, 4 tables, preprint under review
摘要：Multi-task regression attempts to exploit the task similarity in order to achieve knowledge transfer across related tasks for performance improvement. The application of Gaussian process (GP) in this scenario yields the non-parametric yet informative Bayesian multi-task regression paradigm. Multi-task GP (MTGP) provides not only the prediction mean but also the associated prediction variance to quantify uncertainty, thus gaining popularity in various scenarios. The linear model of coregionalization (LMC) is a well-known MTGP paradigm which exploits the dependency of tasks through linear combination of several independent and diverse GPs. The LMC however suffers from high model complexity and limited model capability when handling complicated multi-task cases. To this end, we develop the neural embedding of coregionalization that transforms the latent GPs into a high-dimensional latent space to induce rich yet diverse behaviors. Furthermore, we use advanced variational inference as well as sparse approximation to devise a tight and compact evidence lower bound (ELBO) for higher quality of scalable model inference. Extensive numerical experiments have been conducted to verify the higher prediction quality and better generalization of our model, named NSVLMC, on various real-world multi-task datasets and the cross-fluid modeling of unsteady fluidized bed.

【21】 Coordinate Descent for MCP/SCAD Penalized Least Squares Converges Linearly
标题：MCP/SCAD惩罚最小二乘的坐标下降线性收敛
链接：https://arxiv.org/abs/2109.08850

作者：Yuling Jiao,Dingwei Li,Min Liu,Xiliang Lu
机构： Wuhan University, cn) 3School of Mathematics and Statistics
摘要：Recovering sparse signals from observed data is an important topic in signal/imaging processing, statistics and machine learning. Nonconvex penalized least squares have been attracted a lot of attentions since they enjoy nice statistical properties. Computationally, coordinate descent (CD) is a workhorse for minimizing the nonconvex penalized least squares criterion due to its simplicity and scalability. In this work, we prove the linear convergence rate to CD for solving MCP/SCAD penalized least squares problems.

【22】 A Robust and Efficient Multi-Scale Seasonal-Trend Decomposition
标题：一种稳健有效的多尺度季节趋势分解方法
链接：https://arxiv.org/abs/2109.08800

作者：Linxiao Yang,Qingsong Wen,Bo Yang,Liang Sun
机构：Machine Intelligence Technology, Alibaba Group, Hangzhou, China, Machine Intelligence Technology, Alibaba Group, Bellevue, USA
备注：None
摘要：Many real-world time series exhibit multiple seasonality with different lengths. The removal of seasonal components is crucial in numerous applications of time series, including forecasting and anomaly detection. However, many seasonal-trend decomposition algorithms suffer from high computational cost and require a large amount of data when multiple seasonal components exist, especially when the periodic length is long. In this paper, we propose a general and efficient multi-scale seasonal-trend decomposition algorithm for time series with multiple seasonality. We first down-sample the original time series onto a lower resolution, and then convert it to a time series with single seasonality. Thus, existing seasonal-trend decomposition algorithms can be applied directly to obtain the rough estimates of trend and the seasonal component corresponding to the longer periodic length. By considering the relationship between different resolutions, we formulate the recovery of different components on the high resolution as an optimization problem, which is solved efficiently by our alternative direction multiplier method (ADMM) based algorithm. Our experimental results demonstrate the accurate decomposition results with significantly improved efficiency.

【23】 Small Lesion Segmentation in Brain MRIs with Subpixel Embedding
标题：亚像素嵌入法在脑MRI小病灶分割中的应用
链接：https://arxiv.org/abs/2109.08791

作者：Alex Wong,Allison Chen,Yangchao Wu,Safa Cicek,Alexandre Tiard,Byung-Woo Hong,Stefano Soatto
机构：University of California, Los Angeles, CA, USA, Chung-Ang University, Seoul, Korea
摘要：We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network. Our embedding network learns features that can resolve detailed structures in the brain without the need for high-resolution training images, which are often unavailable and expensive to acquire. Alternatively, the encoder-decoder learns global structures by means of striding and max pooling. Our embedding network complements the encoder-decoder architecture by guiding the decoder with fine-grained details lost to spatial downsampling during the encoder stage. Unlike previous works, our decoder outputs at 2 times the input resolution, where a single pixel in the input resolution is predicted by four neighboring subpixels in our output. To obtain the output at the original scale, we propose a learnable downsampler (as opposed to hand-crafted ones e.g. bilinear) that combines subpixel predictions. Our approach improves the baseline architecture by approximately 11.7% and achieves the state of the art on the ATLAS public benchmark dataset with a smaller memory footprint and faster runtime than the best competing method. Our source code has been made available at: https://github.com/alexklwong/subpixel-embedding-segmentation.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递