机器学习学术速递[9.24]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计83篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 wsGAT: Weighted and Signed Graph Attention Networks for Link Prediction
标题：wsGAT：用于链接预测的加权带符号图注意网络
链接：https://arxiv.org/abs/2109.11519

作者：Marco Grassia,Giuseppe Mangioni
机构：Department of Electric Electronic and Computer Engineering, University of Catania, Catania , ITALY
摘要：图神经网络（GNN）已被广泛用于学习图的表示，并从广泛的领域解决许多实际问题。在本文中，我们提出了wsGAT，这是图形注意网络（GAT）层的一个扩展，旨在解决GNN的不足，GNN可以处理带有符号和加权链接的图形，例如，在信任和关联网络中，这些图形无处不在。我们首先通过比较加权链路预测任务中的GCNII和链路符号预测任务中的SGCN来评估我们方案的性能。然后，我们将这两个任务结合起来，展示它们在预测链接的符号权重和存在性方面的性能。我们在真实网络上的结果表明，具有wsGAT层的模型的性能优于具有GCNII和SGCN层的模型，并且在预测有符号权重时没有性能损失。
摘要：Graph Neural Networks (GNNs) have been widely used to learn representations on graphs and tackle many real-world problems from a wide range of domains. In this paper we propose wsGAT, an extension of the Graph Attention Network (GAT) layers, meant to address the lack of GNNs that can handle graphs with signed and weighted links, which are ubiquitous, for instance, in trust and correlation networks. We first evaluate the performance of our proposal by comparing against GCNII in the weighed link prediction task, and against SGCN in the link sign prediction task. After that, we combine the two tasks and show their performance on predicting the signed weight of links, and their existence. Our results on real-world networks show that models with wsGAT layers outperform the ones with GCNII and SGCN layers, and that there is no loss in performance when signed weights are predicted.

【2】 Graph Neural Netwrok with Interaction Pattern for Group Recommendation
标题：具有交互模式的图神经网络在群体推荐中的应用
链接：https://arxiv.org/abs/2109.11345

作者：Bojie Wang,Yuheng Lu
机构：School of Computer Science (National Pilot Software Engineering School), Beijing, University of Posts and Telecommunications, Xitucheng Road, Haidian District, Beijing, Beijing, China.
摘要：随着社交平台的发展，人们越来越倾向于组合成群体参与一些活动，因此群体推荐逐渐成为一个值得研究的问题。对于群体推荐，一个重要的问题是如何通过个人交互历史获得群体和项目的特征表示，并获得群体对项目的偏好。针对这个问题，我们提出了GIP4GR（具有交互模式的群体推荐图神经网络）模型。具体来说，我们的模型使用具有强大表示能力的图神经网络框架来表示图的拓扑结构中组用户项之间的交互，同时分析图的交互模式来调整图神经网络的特征输出，获取组和项目的特征表示，以计算组对项目的偏好。我们在两个真实数据集上进行了大量实验，以说明我们模型的优越性能。
摘要：With the development of social platforms, people are more and more inclined to combine into groups to participate in some activities, so group recommendation has gradually become a problem worthy of research. For group recommendation, an important issue is how to obtain the characteristic representation of the group and the item through personal interaction history, and obtain the group's preference for the item. For this problem, we proposed the model GIP4GR (Graph Neural Network with Interaction Pattern For Group Recommendation). Specifically, our model use the graph neural network framework with powerful representation capabilities to represent the interaction between group-user-items in the topological structure of the graph, and at the same time, analyze the interaction pattern of the graph to adjust the feature output of the graph neural network, the feature representations of groups, and items are obtained to calculate the group's preference for items. We conducted a lot of experiments on two real-world datasets to illustrate the superior performance of our model.

【3】 Orthogonal Graph Neural Networks
标题：正交图神经网络
链接：https://arxiv.org/abs/2109.11338

作者：Kai Guo,Kaixiong Zhou,Xia Hu,Yu Li,Yi Chang,Xin Wang
机构：School of Artificial Intelligence, Jilin University, Changchun, China, Department of Computer Science, Rice University, USA, College of Computer Science and Technology, Jilin University, Changchun, China
摘要：图神经网络（GNNs）由于其在学习节点表示方面的优越性而受到广泛关注。这些模型依赖于消息传递和特征转换功能对邻居的结构和特征信息进行编码。然而，堆叠更多卷积层会显著降低GNN的性能。最近的研究将这一局限性归因于过度平滑问题，即节点嵌入收敛到不可区分的向量。通过大量的实验观察，我们认为降低性能的主要因素是由于特征变换设计不当而导致的不稳定的前向归一化和后向梯度，特别是对于没有发生过平滑的浅GNN。因此，我们提出了一种新的正交特征变换，名为Ortho-GConv，它可以普遍地扩充现有的GNN主干，以稳定模型训练并提高模型的泛化性能。具体来说，我们从混合权重初始化、正交变换和正交正则化三个角度综合保持了特征变换的正交性。通过为现有GNN（如GCN、JKNet、GCNII）配备Ortho-GConv，我们证明了正交特征变换的通用性，以实现稳定的训练，并展示了其对节点和图分类任务的有效性。
摘要：Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations. These models rely on message passing and feature transformation functions to encode the structural and feature information from neighbors. However, stacking more convolutional layers significantly decreases the performance of GNNs. Most recent studies attribute this limitation to the over-smoothing issue, where node embeddings converge to indistinguishable vectors. Through a number of experimental observations, we argue that the main factor degrading the performance is the unstable forward normalization and backward gradient resulted from the improper design of the feature transformation, especially for shallow GNNs where the over-smoothing has not happened. Therefore, we propose a novel orthogonal feature transformation, named Ortho-GConv, which could generally augment the existing GNN backbones to stabilize the model training and improve the model's generalization performance. Specifically, we maintain the orthogonality of the feature transformation comprehensively from three perspectives, namely hybrid weight initialization, orthogonal transformation, and orthogonal regularization. By equipping the existing GNNs (e.g. GCN, JKNet, GCNII) with Ortho-GConv, we demonstrate the generality of the orthogonal feature transformation to enable stable training, and show its effectiveness for node and graph classification tasks.

【4】 A Novel Factor Graph-Based Optimization Technique for Stereo Correspondence Estimation
标题：一种新的基于因子图的立体对应估计优化技术
链接：https://arxiv.org/abs/2109.11077

作者：Hanieh Shabanian,Madhusudhanan Balasubramanian
机构： The University of MemphisarXiv
备注：25 pages, 5 figures, 6 tables
摘要：多个视图之间的密集差异对于根据场景与视图或摄影机之间的几何关系估计场景的三维结构至关重要。具有较大范围的异质纹理、多个视图中不同的场景照明以及具有遮挡对象的场景会影响估计差异的准确性。基于马尔可夫随机场（MRF）的视差估计方法利用观测值之间和视差估计值之间的空间相关性来解决这些限制。然而，这些方法受到空间固定和较小的邻域系统或派系的限制。在这项工作中，我们提出了一种新的基于因子图的视差估计概率图模型，该模型允许基于局部场景特征确定更大且空间可变的邻域结构。我们使用Middlebury基准立体数据集和Middlebury评估数据集版本3.0评估了我们的方法，并将其性能与最新的最先进的视差估计算法进行了比较。与最近的非学习和基于学习的视差估计算法相比，基于因子图的新方法提供了更高精度的视差估计。除了视差估计之外，我们的因子图公式还可用于获得具有复杂和可变依赖结构的优化问题的最大后验解，以及其他密集估计问题，如光流估计。
摘要：Dense disparities among multiple views is essential for estimating the 3D architecture of a scene based on the geometrical relationship among the scene and the views or cameras. Scenes with larger extents of heterogeneous textures, differing scene illumination among the multiple views and with occluding objects affect the accuracy of the estimated disparities. Markov random fields (MRF) based methods for disparity estimation address these limitations using spatial dependencies among the observations and among the disparity estimates. These methods, however, are limited by spatially fixed and smaller neighborhood systems or cliques. In this work, we present a new factor graph-based probabilistic graphical model for disparity estimation that allows a larger and a spatially variable neighborhood structure determined based on the local scene characteristics. We evaluated our method using the Middlebury benchmark stereo datasets and the Middlebury evaluation dataset version 3.0 and compared its performance with recent state-of-the-art disparity estimation algorithms. The new factor graph-based method provided disparity estimates with higher accuracy when compared to the recent non-learning- and learning-based disparity estimation algorithms. In addition to disparity estimation, our factor graph formulation can be useful for obtaining maximum a posteriori solution to optimization problems with complex and variable dependency structures as well as for other dense estimation problems such as optical flow estimation.

GAN|对抗|攻击|生成相关(6篇)

【1】 FooBaR: Fault Fooling Backdoor Attack on Neural Network Training
标题：FooBaR：神经网络训练中的故障愚弄后门攻击
链接：https://arxiv.org/abs/2109.11249

作者：Jakub Breier,Xiaolu Hou,Martín Ochoa,Jesus Solano
机构： TU-Graz SAL DES Lab and GrazUniversity of Technology, SlovakUniversity of Technology
摘要：众所周知，神经网络实现容易受到物理攻击向量（如故障注入攻击）的攻击。到目前为止，这些攻击仅在推理阶段使用，目的是造成错误分类。在这项工作中，我们探索了一种新的攻击模式，即在神经网络的训练阶段注入故障，从而在部署过程中攻击生成的网络，而无需进一步的故障处理。特别是，我们讨论了针对ReLU激活函数的攻击，这些攻击使生成一系列恶意输入成为可能，这些恶意输入被称为愚弄输入，在推理时用于诱导受控错误分类。这种恶意输入是通过数学上求解线性方程组获得的，该线性方程组会导致受攻击激活函数的特定行为，类似于通过故障训练中产生的行为。我们将此类攻击称为欺骗后门，因为在训练阶段的故障攻击将后门注入网络，从而允许攻击者产生欺骗输入。我们在一项流行的图像分类任务中，针对多层感知器网络和卷积网络对我们的方法进行了评估，获得了高的攻击成功率（从60%到100%）和高的分类置信度（当只有25个神经元受到攻击时），同时保持了最初预期分类任务的高精度。
摘要：Neural network implementations are known to be vulnerable to physical attack vectors such as fault injection attacks. As of now, these attacks were only utilized during the inference phase with the intention to cause a misclassification. In this work, we explore a novel attack paradigm by injecting faults during the training phase of a neural network in a way that the resulting network can be attacked during deployment without the necessity of further faulting. In particular, we discuss attacks against ReLU activation functions that make it possible to generate a family of malicious inputs, which are called fooling inputs, to be used at inference time to induce controlled misclassifications. Such malicious inputs are obtained by mathematically solving a system of linear equations that would cause a particular behaviour on the attacked activation functions, similar to the one induced in training through faulting. We call such attacks fooling backdoors as the fault attacks at the training phase inject backdoors into the network that allow an attacker to produce fooling inputs. We evaluate our approach against multi-layer perceptron networks and convolutional networks on a popular image classification task obtaining high attack success rates (from 60% to 100%) and high classification confidence when as little as 25 neurons are attacked while preserving high accuracy on the originally intended classification task.

【2】 Adversarial Transfer Attacks With Unknown Data and Class Overlap
标题：具有未知数据和类别重叠的对抗性传输攻击
链接：https://arxiv.org/abs/2109.11125

作者：Luke E. Richards,André Nguyen,Ryan Capps,Steven Forsythe,Cynthia Matuszek,Edward Raff
机构：Booz Allen Hamilton, Univ. of MD, Baltimore County, USA, NVIDIA
备注：to appear in Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security
摘要：将对抗性攻击从一个模型（代理）转移到另一个模型（受害者）的能力一直是机器学习（ML）社区关注的问题。成功规避看不见的模型的能力代表了实现攻击的不舒适的轻松程度。在这项工作中，我们注意到，正如所研究的那样，当前的转移攻击研究对攻击者有一个不切实际的优势：攻击者与受害者拥有完全相同的训练数据。我们提出了第一项转移对抗性攻击的研究，重点是攻击者和受害者在不完善的环境下可用的数据，而不询问受害者，其中使用的确切数据或每个模型学习的类中存在一定程度的重叠。此威胁模型与医学、恶意软件和其他应用程序相关。在这种新的威胁模型下，攻击成功率与数据或类重叠的关系不像人们预期的那样密切，而是随数据集的不同而变化。这使得攻击者和防御者很难相互推理，并有助于更广泛地研究模型的健壮性和安全性。我们通过开发模拟类差异的投影梯度下降的蒙面版本来解决这一问题，这使攻击者能够可靠地估计其攻击成功的下限。
摘要：The ability to transfer adversarial attacks from one model (the surrogate) to another model (the victim) has been an issue of concern within the machine learning (ML) community. The ability to successfully evade unseen models represents an uncomfortable level of ease toward implementing attacks. In this work we note that as studied, current transfer attack research has an unrealistic advantage for the attacker: the attacker has the exact same training data as the victim. We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings without querying the victim, where there is some variable level of overlap in the exact data used or in the classes learned by each model. This threat model is relevant to applications in medicine, malware, and others. Under this new threat model attack success rate is not correlated with data or class overlap in the way one would expect, and varies with dataset. This makes it difficult for attacker and defender to reason about each other and contributes to the broader study of model robustness and security. We remedy this by developing a masked version of Projected Gradient Descent that simulates class disparity, which enables the attacker to reliably estimate a lower-bound on their attack's success.

【3】 Improved Evolutionary Generative Adversarial Networks
标题：改进的进化产生式对抗网络
链接：https://arxiv.org/abs/2109.11078

作者：Junjie Li,Shuai Lü
机构：)Junjie Li and Shuai L¨u are with the Key Laboratory of SymbolicComputation and Knowledge Engineering (Jilin University), Jilin University
备注：arXiv admin note: text overlap with arXiv:2101.11186
摘要：进化生成对抗网络（E-GAN）试图通过引入进化计算来缓解困扰生成对抗网络的模式崩溃和梯度消失。但由于缺乏合理的评价机制，没有达到其设计目的。它在进化过程中只包含变异算子，而不包含与之相同的交叉算子。本文首先指出了E-GAN多样性适应度函数的不足，提出了一种新的适应度函数。在此基础上，提出了一种通用的知识提取交叉算子，该算子可广泛应用于进化遗传算法中，弥补了E-GAN中缺少的交叉变异。结合适应度函数和交叉算子，设计了改进的进化生成对抗网络（IE-GAN）框架，并结合E-GAN完成了算法实现。在各种数据集上的实验证明了IE-GAN的有效性，并表明我们的方法在生成样本的质量和时间效率方面具有竞争力。
摘要：Evolutionary generative adversarial networks (E-GAN) tries to alleviate mode collapse and gradient vanish that plague generative adversarial networks by introducing evolutionary computation. But because of the lack of a reasonable evaluation mechanism, it did not achieve its design purpose. And it contains only mutation operators in its evolutionary step, but not crossover operator which are equally common with it. In this paper, we firstly point out the shortcomings of the diversity fitness function of E-GAN and propose a new function. Then we propose a universal crossover operator over knowledge distillation, which can be widely applied to evolutionary GANs and complement the missing crossover variation of E-GAN. Incorporating the fitness function and crossover operator we design an evolutionary GAN framework named improved evolutionary generative adversarial networks (IE-GAN) and combine E-GAN to complete an algorithm implementation. Experiments on various datasets demonstrate the effectiveness of IE-GAN and show that our method is competitive in terms of generated samples quality and time efficiency.

【4】 A two-step machine learning approach for crop disease detection: an application of GAN and UAV technology
标题：作物病害检测的两步机器学习方法：GAN和无人机技术的应用
链接：https://arxiv.org/abs/2109.11066

作者：Aaditya Prasad,Nikhil Mehta,Matthew Horak,Wan D. Bae
机构： a reliable diagnosis requires either another hired expert on hand or for the plant to besent to a nearby university or lab
备注：13 pages, 5 figures Preprint of an article submitted for consideration in the International Journal on Artificial Intelligence Tools, 2021, World Scientific Publishing Company, this https URL
摘要：自动植物诊断是一项有望大幅提高农业成本效益的技术。然而，多个问题降低了无人机的效能，包括分辨率和速度之间的反比关系以及缺乏足够的标记训练数据。本文提出了一种两步机器学习方法，该方法可以按顺序分析低保真度和高保真度图像，既保持了效率又保持了准确性。两个数据生成器还用于最小化高保真数据集中的类别不平衡，并生成代表无人机图像的低保真数据。在一个高保真苹果树图像数据库上进行了应用和方法分析，该数据库存在类不平衡问题。该应用程序首先使用生成网络生成高保真数据，然后将此新数据与原始高保真数据一起使用以生成低保真图像。机器学习标识符识别植物，并将其标记为潜在疾病或非潜在疾病。然后给机器学习分类器潜在的患病植物图像，并返回这些植物的实际诊断。结果表明，高保真系统的准确度为96.3%，低保真系统的置信度为75.5%。与基于人工的诊断方法相比，我们的无人机技术在准确性方面显示出了有希望的结果。
摘要：Automated plant diagnosis is a technology that promises large increases in cost-efficiency for agriculture. However, multiple problems reduce the effectiveness of drones, including the inverse relationship between resolution and speed and the lack of adequate labeled training data. This paper presents a two-step machine learning approach that analyzes low-fidelity and high-fidelity images in sequence, preserving efficiency as well as accuracy. Two data-generators are also used to minimize class imbalance in the high-fidelity dataset and to produce low-fidelity data that is representative of UAV images. The analysis of applications and methods is conducted on a database of high-fidelity apple tree images which are corrupted with class imbalance. The application begins by generating high-fidelity data using generative networks and then uses this novel data alongside the original high-fidelity data to produce low-fidelity images. A machine-learning identifier identifies plants and labels them as potentially diseased or not. A machine learning classifier is then given the potentially diseased plant images and returns actual diagnoses for these plants. The results show an accuracy of 96.3% for the high-fidelity system and a 75.5% confidence level for our low-fidelity system. Our drone technology shows promising results in accuracy when compared to labor-based methods of diagnosis.

【5】 Revisit Geophysical Imaging in A New View of Physics-informed Generative Adversarial Learning
标题：从物理知识生成性对抗性学习的新视角再看地球物理成像
链接：https://arxiv.org/abs/2109.11452

作者：Fangshu Yang,Jianwei Ma
机构：School of Mathematics, Harbin Institute of Technology, School of Earth and Space Sciences, Peking University
摘要：地震全波形反演（FWI）是一种强大的地球物理成像技术，它通过迭代最小化模拟地震图和观测地震图之间的不匹配来生成高分辨率地下模型。遗憾的是，传统的基于最小二乘函数的FWI存在许多缺点，如局部极小值问题和显式梯度计算。对于受污染的测量或起动不良的模型，这尤其具有挑战性。最近的工作依赖于偏微分方程和神经网络显示了二维FWI的良好性能。受生成性对抗网络竞争学习的启发，我们提出了一种无监督学习范式，将波动方程与判别网络相结合，以准确估计分布意义上的物理一致性模型。我们的框架不需要标记的训练数据，也不需要网络的预训练，可以灵活地实现多参数反演，用户交互最少。该方法忠实地恢复了著名的合成模型，其性能优于经典算法。此外，我们的工作通过降低对初始模型和噪声的敏感性，为回避局部极小值问题铺平了道路。
摘要：Seismic full waveform inversion (FWI) is a powerful geophysical imaging technique that produces high-resolution subsurface models by iteratively minimizing the misfit between the simulated and observed seismograms. Unfortunately, conventional FWI with least-squares function suffers from many drawbacks such as the local-minima problem and computation of explicit gradient. It is particularly challenging with the contaminated measurements or poor starting models. Recent works relying on partial differential equations and neural networks show promising performance for two-dimensional FWI. Inspired by the competitive learning of generative adversarial networks, we proposed an unsupervised learning paradigm that integrates wave equation with a discriminate network to accurately estimate the physically consistent models in a distribution sense. Our framework needs no labelled training data nor pretraining of the network, is flexible to achieve multi-parameters inversion with minimal user interaction. The proposed method faithfully recovers the well-known synthetic models that outperforms the classical algorithms. Furthermore, our work paves the way to sidestep the local-minima issue via reducing the sensitivity to initial models and noise.

【6】 On Optimal Robustness to Adversarial Corruption in Online Decision Problems
标题：在线决策问题中对抗腐败的最优鲁棒性研究
链接：https://arxiv.org/abs/2109.10963

作者：Shinji Ito
机构：NEC Corporation
摘要：本文考虑了两个基本的顺序决策问题：专家建议预测问题和多武装强盗问题。我们关注对手可能腐败损失的随机机制，并研究对抗性腐败可以达到何种程度的鲁棒性。本文的主要贡献是证明了最优鲁棒性可以用腐败量的平方根依赖关系来表示。更准确地说，我们证明了两类算法，即学习率降低的anytime Hedge算法和具有二阶后悔边界的算法，实现了$O（\frac{log N}{\Delta}+\sqrt{C\frac{log N}{\Delta}}}）$-后悔，其中$N、\Delta$和$C$分别表示专家数量、差距参数和腐败程度。我们进一步提供了一个匹配的下限，这意味着这个遗憾的界限是紧到一个常数因子。对于多武装土匪问题，我们还提供了一个接近对数因子的紧下界。
摘要：This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve $O( \frac{\log N}{\Delta} + \sqrt{ \frac{C \log N }{\Delta} } )$-regret, where $N, \Delta$, and $C$ represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 How much "human-like" visual experience do current self-supervised learning algorithms need to achieve human-level object recognition?
链接：https://arxiv.org/abs/2109.11523

作者：A. Emin Orhan
机构：How much “human-like” visual experience do currentself-supervised learning algorithms need to achieve human-levelobject recognition?Emin OrhanNew York Universityaeminorhan
备注：9 pages, 1 figure, 1 table
摘要：本文解决了一个基本问题：我们目前的自监督视觉表征学习算法相对于人类有多好？更具体地说，在复杂、逼真的视觉对象识别任务（如ImageNet）中，这些算法需要多少“类人”的自然视觉体验才能达到人类水平的性能？通过缩放实验，我们估计答案大约是一百万年的自然视觉体验，换句话说，比人类寿命长几个数量级。然而，这一估计对一些基本假设相当敏感，强调需要进行仔细控制的人体实验。我们讨论了围绕我们的估计的主要注意事项以及这个相当令人惊讶的结果的含义。
摘要：This paper addresses a fundamental question: how good are our current self-supervised visual representation learning algorithms relative to humans? More concretely, how much "human-like", natural visual experience would these algorithms need in order to reach human-level performance in a complex, realistic visual object recognition task such as ImageNet? Using a scaling experiment, here we estimate that the answer is on the order of a million years of natural visual experience, in other words several orders of magnitude longer than a human lifetime. However, this estimate is quite sensitive to some underlying assumptions, underscoring the need to run carefully controlled human experiments. We discuss the main caveats surrounding our estimate and the implications of this rather surprising result.

【2】 Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming
标题：学习用于半监督数据规划的鲁棒聚集标记函数
链接：https://arxiv.org/abs/2109.11410

作者：Ayush Maheshwari,Krishnateja Killamsetty,Ganesh Ramakrishnan,Rishabh Iyer,Marina Danilevsky,Lucian Popa
机构：Department of CSE, IIT Bombay, India , The University of Texas at Dallas, IBM Research – Almaden
摘要：有监督机器学习的一个关键瓶颈是需要大量昂贵且耗时的标记数据。然而，研究表明，少量标记数据虽然不足以重新训练模型，但可以有效地用于生成人类可解释标记函数（LFs）。反过来，这些LFs被用来生成大量额外的带噪标记的数据，这种模式现在通常被称为数据编程。然而，以前自动生成LFs的方法没有尝试进一步使用给定的标记数据进行模型训练，因此放弃了提高性能的机会。此外，由于LF是从一个相对较小的标记数据集生成的，因此它们容易产生噪声，并且天真地聚合这些LF在实践中会导致非常差的性能。在这项工作中，我们提出了一个基于LF的重新加权框架\ouralgo{}来解决这两个关键限制。我们的算法以半监督的方式在用于LF归纳的（相同）标记数据集上学习联合模型以及任何未标记数据，更重要的是，根据每个LF的优点重新加权，使用鲁棒的双层优化算法影响其对半监督损失的贡献。我们证明了我们的算法在几个文本分类数据集上明显优于先前的方法。
摘要：A critical bottleneck in supervised machine learning is the need for large amounts of labeled data which is expensive and time consuming to obtain. However, it has been shown that a small amount of labeled data, while insufficient to re-train a model, can be effectively used to generate human-interpretable labeling functions (LFs). These LFs, in turn, have been used to generate a large amount of additional noisy labeled data, in a paradigm that is now commonly referred to as data programming. However, previous approaches to automatically generate LFs make no attempt to further use the given labeled data for model training, thus giving up opportunities for improved performance. Moreover, since the LFs are generated from a relatively small labeled dataset, they are prone to being noisy, and naively aggregating these LFs can lead to very poor performance in practice. In this work, we propose an LF based reweighting framework \ouralgo{} to solve these two critical limitations. Our algorithm learns a joint model on the (same) labeled dataset used for LF induction along with any unlabeled data in a semi-supervised manner, and more critically, reweighs each LF according to its goodness, influencing its contribution to the semi-supervised loss using a robust bi-level optimization algorithm. We show that our algorithm significantly outperforms prior approaches on several text classification datasets.

【3】 WRENCH: A Comprehensive Benchmark for Weak Supervision
标题：扳手：弱监管的综合标杆
链接：https://arxiv.org/abs/2109.11377

作者：Jieyu Zhang,Yue Yu,Yinghao Li,Yujing Wang,Yaming Yang,Mao Yang,Alexander Ratner
机构：Microsoft Research Asia, University of Washington, Georgia Institute of Technology
摘要：最近的{emph{weake Supervision（WS）}方法通过从多个可能有噪声的监督源合成标签，在缓解为机器学习标记训练数据的瓶颈方面取得了广泛的成功。然而，对这些方法进行适当的测量和分析仍然是一项挑战。首先，现有作品中使用的数据集通常是私有和/或定制的，限制了标准化。其次，具有相同名称和基础数据的WS-DataSet通常在标签和使用的弱监督源方面有所不同，这是评估差异的一个重要“隐藏”来源。最后，WS研究在评估协议和使用的消融方面往往存在分歧。为了解决这些问题，我们引入了一个基准平台&benchmark，用于对WS方法进行全面和标准化的评估。它由22个不同的真实世界数据集组成，用于分类和序列标记；一系列真实的、综合的、程序性的薄弱监管来源；以及用于WS-evaluation的模块化、可扩展框架，包括流行WS-methods的实现。我们使用\ benchmark对100多种方法变体进行广泛比较，以证明其作为基准平台的有效性。该代码位于\url{https://github.com/JieyuZ2/wrench}.
摘要：Recent \emph{Weak Supervision (WS)} approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper measurement and analysis of these approaches remain a challenge. First, datasets used in existing works are often private and/or custom, limiting standardization. Second, WS datasets with the same name and base data often vary in terms of the labels and weak supervision sources used, a significant "hidden" source of evaluation variance. Finally, WS studies often diverge in terms of the evaluation protocol and ablations used. To address these problems, we introduce a benchmark platform, \benchmark, for a thorough and standardized evaluation of WS approaches. It consists of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods. We use \benchmark to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform. The code is available at \url{https://github.com/JieyuZ2/wrench}.

【4】 Active Learning for Argument Strength Estimation
标题：用于论据强度估计的主动学习
链接：https://arxiv.org/abs/2109.11319

作者：Nataliia Kees,Michael Fromm,Evgeniy Faerman,Thomas Seidl
机构：LMU Munich, Germany
摘要：高质量的论据是决策的重要组成部分。自动预测参数的质量是一项复杂的任务，近年来在参数挖掘中受到了广泛的关注。但是，此任务的注释工作量异常高。因此，我们在两个流行的参数强度数据集上测试基于不确定性的主动学习（AL）方法，以估计是否可以启用样本有效学习。我们广泛的经验评估表明，基于不确定性的采集函数不能超过随机采集这些数据集所达到的精度。
摘要：High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficient learning can be enabled. Our extensive empirical evaluation shows that uncertainty-based acquisition functions can not surpass the accuracy reached with the random acquisition on these data sets.

【5】 A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification
标题：分类主动学习选择算法的代价类型、交互机制和注释器性能模型研究综述
链接：https://arxiv.org/abs/2109.11301

作者：Marek Herde,Denis Huseljic,Bernhard Sick,Adrian Calma
机构：This research was supported by the CIL project at the University of Kasselunder internal funding P7 10 and P 108 2
摘要：基于池的主动学习（AL）旨在优化注释过程（即标记），因为注释的获取通常非常耗时，因此成本高昂。为此，AL策略从注释器智能地查询注释，以低注释成本训练高性能分类模型。传统的人工智能策略是在理想化的框架下运行的。他们假设一个单一的、无所不知的注释者，无论查询难度如何，他都不会感到疲倦，并统一收费。然而，在现实世界的应用程序中，我们经常遇到人工注释者，例如人群或内部工作人员，他们会犯注释错误，如果疲劳或面临复杂的查询，他们可能不愿意响应。最近，人们提出了一系列新颖的AL策略来解决这些问题。它们在以下三个中心方面至少与传统的AL不同：（1）它们明确地考虑（多个）人类注释器，其性能可以受到各种因素的影响，例如缺少专门知识。（2）他们通过考虑不同的查询和注释类型（例如要求注释者对推断的分类规则进行反馈）来概括与注释者的交互。（3）它们考虑了有关注释和错误分类的更复杂的成本方案。本调查概述了这些人工智能策略，并将其称为现实世界人工智能。因此，我们在学习周期中引入了一种通用的现实世界人工智能策略，并使用其元素，例如查询和注释选择算法，对约60种实际人工智能策略进行分类。最后，我们概述了铝领域未来可能的研究方向。
摘要：Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.

【6】 Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
标题：下游审计任务会计数据表示的多视角对比自监督学习
链接：https://arxiv.org/abs/2109.11201

作者：Marco Schreyer,Timur Sattarov,Damian Borth
机构：University of St. Gallen, St. Gallen, Switzerland, Deutsche Bundesbank, Frankfurt am Main, Germany
备注：8 pages (excl. appendix), 4 Figures, 3 Tables
摘要：国际审计准则要求直接评估财务报表的基本会计交易，称为日记账分录。近年来，在人工智能技术的推动下，在审计大量日记账分录数据的领域出现了基于深度学习的审计技术。如今，大多数此类方法依赖于一组专门的模型，每个模型都针对特定的审计任务进行训练。同时，在进行财务报表审计时，审计小组面临（i）具有挑战性的时间预算限制，（ii）广泛的文件义务，以及（iii）严格的模型解释性要求。因此，审计师在整个审计业务中只倾向于采用单一模式，最好是“多用途”模式。我们提出了一个对比自监督学习框架，旨在学习审计任务不变的会计数据表示，以满足这一要求。该框架包含利用日记账分录数据属性特征的有意交互数据扩充策略。我们在两个真实的城市支付数据集上评估该框架，并将学习到的表示转移到三个下游审计任务：异常检测、审计抽样和审计文档。我们的实验结果提供了经验证据，表明所提出的框架能够通过学习丰富且可解释的“多任务”表示来提高审计效率。
摘要：International audit standards require the direct assessment of a financial statement's underlying accounting transactions, referred to as journal entries. Recently, driven by the advances in artificial intelligence, deep learning inspired audit techniques have emerged in the field of auditing vast quantities of journal entry data. Nowadays, the majority of such methods rely on a set of specialized models, each trained for a particular audit task. At the same time, when conducting a financial statement audit, audit teams are confronted with (i) challenging time-budget constraints, (ii) extensive documentation obligations, and (iii) strict model interpretability requirements. As a result, auditors prefer to harness only a single preferably `multi-purpose' model throughout an audit engagement. We propose a contrastive self-supervised learning framework designed to learn audit task invariant accounting data representations to meet this requirement. The framework encompasses deliberate interacting data augmentation policies that utilize the attribute characteristics of journal entry data. We evaluate the framework on two real-world datasets of city payments and transfer the learned representations to three downstream audit tasks: anomaly detection, audit sampling, and audit documentation. Our experimental results provide empirical evidence that the proposed framework offers the ability to increase the efficiency of audits by learning rich and interpretable `multi-task' representations.

【7】 Mixed-supervised segmentation: Confidence maximization helps knowledge distillation
标题：混合监督分割：置信度最大化有助于知识提炼
链接：https://arxiv.org/abs/2109.10902

作者：Bingyuan Liu,Christian Desrosiers,Ismail Ben Ayed,Jose Dolz
机构：ÉTS Montréal, Centre de recherche du Centre hospitalier de l’Université de Montréal (CRCHUM), A R T I C L E, I N F O, Article history:
备注：13 pages, 10 figures. Currently under review at Medical Image Analysis. Code available at this https URL arXiv admin note: substantial text overlap with arXiv:2012.08051
摘要：尽管在广泛的医学图像分割任务中取得了有希望的结果，但深度神经网络需要具有像素注释的大型训练数据集。获取这些经过整理的数据集是一个繁琐的过程，这限制了在缺少注释图像的场景中的应用。混合监控是缓解这一障碍的一种很有吸引力的选择，因为只有一小部分数据包含完整的像素注释，而其他图像的监控形式较弱。在这项工作中，我们提出了一种双分支体系结构，其中上层分支（教师）接受强注释，而底层分支（学生）受有限的监督和上层分支的指导。结合标记像素上的标准交叉熵损失，我们的新公式集成了两个重要术语：（i）在较少监督的图像上定义的香农熵损失，这鼓励学生在底部分支中进行自信的预测；（ii）Kullback-Leibler（KL）散度项，它将强监督分支的知识转移到弱监督分支，并引导熵（学生信心）项避免平凡解。我们证明了熵和KL散度之间的协同作用在性能上产生了实质性的改进。我们还讨论了香农熵最小化和标准伪掩模生成之间的一个有趣的联系，并认为前者比后者更适合利用未标记像素的信息。在两个公开数据集上的定量和定性结果表明，我们的方法明显优于混合监督框架下的其他语义分割策略，以及最近的半监督方法。此外，我们还表明，在减少监督的情况下接受训练并由顶级分支指导的分支在很大程度上优于后者。
摘要：Despite achieving promising results in a breadth of medical image segmentation tasks, deep neural networks require large training datasets with pixel-wise annotations. Obtaining these curated datasets is a cumbersome process which limits the application in scenarios where annotated images are scarce. Mixed supervision is an appealing alternative for mitigating this obstacle, where only a small fraction of the data contains complete pixel-wise annotations and other images have a weaker form of supervision. In this work, we propose a dual-branch architecture, where the upper branch (teacher) receives strong annotations, while the bottom one (student) is driven by limited supervision and guided by the upper branch. Combined with a standard cross-entropy loss over the labeled pixels, our novel formulation integrates two important terms: (i) a Shannon entropy loss defined over the less-supervised images, which encourages confident student predictions in the bottom branch; and (ii) a Kullback-Leibler (KL) divergence term, which transfers the knowledge of the strongly supervised branch to the less-supervised branch and guides the entropy (student-confidence) term to avoid trivial solutions. We show that the synergy between the entropy and KL divergence yields substantial improvements in performance. We also discuss an interesting link between Shannon-entropy minimization and standard pseudo-mask generation, and argue that the former should be preferred over the latter for leveraging information from unlabeled pixels. Quantitative and qualitative results on two publicly available datasets demonstrate that our method significantly outperforms other strategies for semantic segmentation within a mixed-supervision framework, as well as recent semi-supervised approaches. Moreover, we show that the branch trained with reduced supervision and guided by the top branch largely outperforms the latter.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Zero-Shot Information Extraction as a Unified Text-to-Triple Translation
标题：作为统一文本到三重翻译的Zero-Shot信息提取
链接：https://arxiv.org/abs/2109.11171

作者：Chenguang Wang,Xiao Liu,Zui Chen,Haoyun Hong,Jie Tang,Dawn Song
机构：∗UC Berkeley, ¶Tsinghua University
备注：EMNLP 2021; 14 pages, 5 figures, 9 tables
摘要：我们将一组信息提取任务转换为文本到三重翻译框架。我们将任务形式化为特定于任务的输入文本和输出三元组之间的转换，而不是依赖于特定于任务的数据集和模型来解决每个任务。通过接受特定于任务的输入，我们可以利用预先训练的语言模型对任务的潜在知识，实现任务无关的翻译。我们进一步证明，预测哪些关系信息对应于哪些输入文本的简单训练前任务是产生特定任务输出的有效方法。这使得我们的框架能够零炮传输到下游任务。我们研究了该框架在开放信息提取（OIE2016、NYT、WEB、PENN）、关系分类（FewRel和TACRED）和事实探测（Google RE和T-REx）方面的Zero-Shot性能。该模型可非平凡地转移到大多数任务，并且通常与完全监督的方法竞争，而无需任何特定于任务的训练。例如，我们在无需使用其训练集的情况下显著优于有监督的开放信息提取的F1分数。
摘要：We cast a suite of information extraction tasks into a text-to-triple translation framework. Instead of solving each task relying on task-specific datasets and models, we formalize the task as a translation between task-specific input text and output triples. By taking the task-specific input, we enable a task-agnostic translation by leveraging the latent knowledge that a pre-trained language model has about the task. We further demonstrate that a simple pre-training task of predicting which relational information corresponds to which input text is an effective way to produce task-specific outputs. This enables the zero-shot transfer of our framework to downstream tasks. We study the zero-shot performance of this framework on open information extraction (OIE2016, NYT, WEB, PENN), relation classification (FewRel and TACRED), and factual probe (Google-RE and T-REx). The model transfers non-trivially to most tasks and is often competitive with a fully supervised method without the need for any task-specific training. For instance, we significantly outperform the F1 score of the supervised open information extraction without needing to use its training set.

【2】 On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods
标题：随机梯度下降法不同自适应批量选择策略的等价性
链接：https://arxiv.org/abs/2109.10933

作者：Luis Espath,Sebastian Krumscheid,Raúl Tempone,Pedro Vilanova
机构： 2King Abdullah University of Science & Technology (KAUST)
摘要：在这项研究中，我们证明了当$\epsilon^2=\theta^2+\nu^2$具有特定的$\theta$和$\nu$选择时，{Bol18}中给出的范数检验和内积/正交性检验在与随机梯度下降（SGD）方法相关的收敛速度方面是等价的。此处，$\epsilon$控制渐变范数的相对统计误差，$\theta$和$\nu$分别控制渐变方向和与渐变正交方向上渐变的相对统计误差。此外，我们还证明了在最佳情况下，如果选择$\theta$和$\nu$，内积/正交性测试可以与范数测试一样便宜，但是如果$\epsilon^2=\theta^2+\nu^2$，内积/正交性测试在计算上永远不会比范数测试更便宜。最后，我们提出了两个随机优化问题来说明我们的结果。
摘要：In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if $\epsilon^2=\theta^2+\nu^2$ with specific choices of $\theta$ and $\nu$. Here, $\epsilon$ controls the relative statistical error of the norm of the gradient while $\theta$ and $\nu$ control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if $\theta$ and $\nu$ are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if $\epsilon^2=\theta^2+\nu^2$. Finally, we present two stochastic optimization problems to illustrate our results.

强化学习(4篇)

【1】 Reinforcement Learning Under Algorithmic Triage
标题：算法分流下的强化学习
链接：https://arxiv.org/abs/2109.11328

作者：Eleni Straitouri,Adish Singla,Vahid Balazadeh Meresht,Manuel Gomez-Rodriguez
机构： Manuel Gomez Rodriguez 1 1Max Planck Institute for Software Systems, org 2University of Toronto
摘要：算法分类下的学习方法主要集中于监督学习环境，其中每个决策或预测相互独立。在算法分类下，监督学习模型预测一小部分实例，人类预测其余的实例。在这项工作中，我们朝着开发强化学习模型迈出了第一步，该模型经过优化，可在算法分类下运行。为此，我们通过选项框架来研究这个问题，并开发了一种两阶段的演员-评论家方法来学习分类下的强化学习模型。第一阶段使用在人员自行操作的环境中收集的人员数据执行离线、非策略训练。第二阶段进行政策训练，以说明转换可能对人力政策产生的影响，这可能很难从上述人力数据中预测。在一个合成汽车驾驶任务中进行的大量仿真实验表明，使用我们的两阶段方法训练的机器模型和分类策略有效地补充了人工策略，并且优于几个竞争基线提供的策略。
摘要：Methods to learn under algorithmic triage have predominantly focused on supervised learning settings where each decision, or prediction, is independent of each other. Under algorithmic triage, a supervised learning model predicts a fraction of the instances and humans predict the remaining ones. In this work, we take a first step towards developing reinforcement learning models that are optimized to operate under algorithmic triage. To this end, we look at the problem through the framework of options and develop a two-stage actor-critic method to learn reinforcement learning models under triage. The first stage performs offline, off-policy training using human data gathered in an environment where the human has operated on their own. The second stage performs on-policy training to account for the impact that switching may have on the human policy, which may be difficult to anticipate from the above human data. Extensive simulation experiments in a synthetic car driving task show that the machine models and the triage policies trained using our two-stage method effectively complement human policies and outperform those provided by several competitive baselines.

【2】 Enhancing Navigational Safety in Crowded Environments using Semantic-Deep-Reinforcement-Learning-based Navigation
标题：基于语义深度强化学习的导航提高拥挤环境下导航安全
链接：https://arxiv.org/abs/2109.11288

作者：Linh Kästner,Junhui Li,Zhengcheng Shen,Jens Lambrecht
机构： Berlin Institute of Technology
备注：7 pages, 5 figures, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
摘要：在社交人群中进行智能导航是移动机器人应用的一个重要方面，如交付、医疗或援助。深度强化学习作为一种替代保守方法的规划方法出现，并有望实现更高效、更灵活的导航。然而，在采用不同类型障碍物的高度动态环境中，安全导航仍然是一个巨大的挑战。在本文中，我们提出了一种基于语义深度强化学习的导航方法，该方法通过考虑高层障碍物信息来教授特定于对象的安全规则。特别是，代理通过考虑特定的危险区域来学习对象特定的行为，以增强易受攻击对象类周围的安全性。我们将该方法与基准避障方法进行了对比测试，发现安全性有所提高。此外，我们还证明了agent可以通过保持依赖于语义信息的个体安全距离来学习更安全的导航。
摘要：Intelligent navigation among social crowds is an essential aspect of mobile robotics for applications such as delivery, health care, or assistance. Deep Reinforcement Learning emerged as an alternative planning method to conservative approaches and promises more efficient and flexible navigation. However, in highly dynamic environments employing different kinds of obstacle classes, safe navigation still presents a grand challenge. In this paper, we propose a semantic Deep-reinforcement-learning-based navigation approach that teaches object-specific safety rules by considering high-level obstacle information. In particular, the agent learns object-specific behavior by contemplating the specific danger zones to enhance safety around vulnerable object classes. We tested the approach against a benchmark obstacle avoidance approach and found an increase in safety. Furthermore, we demonstrate that the agent could learn to navigate more safely by keeping an individual safety distance dependent on the semantic information.

【3】 Tactile Grasp Refinement using Deep Reinforcement Learning and Analytic Grasp Stability Metrics
标题：基于深度强化学习和解析抓取稳定性度量的触觉抓取精化
链接：https://arxiv.org/abs/2109.11234

作者：Alexander Koenig,Zixi Liu,Lucas Janson,Robert Howe
机构： 1 Department of Informatics, Technical University of Munich 2 School of Engineering and Applied Sciences, Harvard University 3 Departments of Statistics, Harvard University 4 RightHand Robotics
备注：paper currently under review, 7 pages, 10 figures, video: this https URL, code: this https URL
摘要：奖励函数是每个强化学习（RL）算法的核心。在机器人抓取中，奖励通常是复杂的人工设计功能，不依赖抓取分析中合理的物理模型。这项工作表明，解析抓取稳定性指标构成了RL算法的强大优化目标，该算法仅使用触觉和关节位置信息来优化三指手的抓取。我们的表现优于二元奖励基线42.9%，并发现几何和力不可知抓取稳定性指标的组合产生了最高的平均成功率，长方体为95.4%，圆柱体为93.1%，球体为62.3%，手腕位置误差在0到7厘米之间，旋转误差在0到14度之间。在第二个实验中，我们表明，使用接触反馈（接触位置、法线和力）训练的抓取细化算法比不接收触觉信息的基线性能高出6.6%。
摘要：Reward functions are at the heart of every reinforcement learning (RL) algorithm. In robotic grasping, rewards are often complex and manually engineered functions that do not rely on well-justified physical models from grasp analysis. This work demonstrates that analytic grasp stability metrics constitute powerful optimization objectives for RL algorithms that refine grasps on a three-fingered hand using only tactile and joint position information. We outperform a binary-reward baseline by 42.9% and find that a combination of geometric and force-agnostic grasp stability metrics yields the highest average success rates of 95.4% for cuboids, 93.1% for cylinders, and 62.3% for spheres across wrist position errors between 0 and 7 centimeters and rotational errors between 0 and 14 degrees. In a second experiment, we show that grasp refinement algorithms trained with contact feedback (contact positions, normals, and forces) perform up to 6.6% better than a baseline that receives no tactile information.

【4】 Hierarchies of Planning and Reinforcement Learning for Robot Navigation
标题：机器人导航的分层规划与强化学习
链接：https://arxiv.org/abs/2109.11178

作者：Jan Wöhlke,Felix Schmitt,Herke van Hoof
机构：© , IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media
备注：7 pages, 5 figures, accepted for 2021 IEEE International Conference on Robotics and Automation (ICRA)
摘要：通过强化学习（RL）解决机器人导航任务是一项具有挑战性的任务，因为其奖励稀少且决策范围长。但是，在许多导航任务中，可以使用高级（HL）任务表示，如粗略的平面布置图。之前的工作已经证明了通过分层方法的有效学习，包括HL表示中的路径规划，以及使用从计划中衍生的子目标来指导源任务中的RL策略。然而，这些方法通常忽略了机器人在规划过程中的复杂动力学和次优次目标达成能力。这项工作克服了这些局限性，提出了一种新的层次结构框架，利用HL表示的可训练规划策略。因此，可以利用收集的卷展数据学习机器人的能力和环境条件。我们特别介绍了一种基于价值迭代和学习过渡模型（VI-RL）的规划策略。在模拟机器人导航任务中，VI-RL对香草RL的一致性很强，与单一布局上的香草等级RL相当，但更广泛地适用于多个布局，并且与可训练的HL路径规划基线相媲美，除了停车任务具有困难的非完整动力学，其中显示明显的改进。
摘要：Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goals derived from the plan to guide the RL policy in the source task. However, these approaches usually neglect the complex dynamics and sub-optimal sub-goal-reaching capabilities of the robot during planning. This work overcomes these limitations by proposing a novel hierarchical framework that utilizes a trainable planning policy for the HL representation. Thereby robot capabilities and environment conditions can be learned utilizing collected rollout data. We specifically introduce a planning policy based on value iteration with a learned transition model (VI-RL). In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.

医学相关(3篇)

【1】 End-to-End AI-based MRI Reconstruction and Lesion Detection Pipeline for Evaluation of Deep Learning Image Reconstruction
标题：用于深度学习图像重建评估的端到端基于AI的MRI重建和病变检测流水线
链接：https://arxiv.org/abs/2109.11524

作者：Ruiyang Zhao,Yuxin Zhang,Burhaneddin Yaman,Matthew P. Lungren,Michael S. Hansen
机构： Microsoft Research, Health Futures, University of Wisconsin-Madison, Department of Medical Physics, University of Minnesota, Department of Electrical and Computer Engineering, Stanford University, Department of Radiology
摘要：深度学习技术已经成为一种很有前途的高加速MRI方法。然而，最近的重建挑战显示了当前深度学习方法的一些缺点，包括即使使用在全局质量度量方面表现良好的模型，也会丢失精细的图像细节。在这项研究中，我们提出了一个用于图像重建和病理检测的端到端深度学习框架，它能够对深度学习重建质量进行临床感知评估。该解决方案在膝关节MRI研究中用于检测半月板撕裂的用例中进行了演示，最终发现使用常用重建方法的精细图像细节丢失，表现为检测半月板撕裂等重要病理学的能力降低。尽管使用SSIM等指标进行定量重建方法评估的常见做法，但作为基于病理学的自动化重建评估方法的受损病理学检测表明，现有的定量方法无法捕获临床上重要的重建结果。
摘要：Deep learning techniques have emerged as a promising approach to highly accelerated MRI. However, recent reconstruction challenges have shown several drawbacks in current deep learning approaches, including the loss of fine image details even using models that perform well in terms of global quality metrics. In this study, we propose an end-to-end deep learning framework for image reconstruction and pathology detection, which enables a clinically aware evaluation of deep learning reconstruction quality. The solution is demonstrated for a use case in detecting meniscal tears on knee MRI studies, ultimately finding a loss of fine image details with common reconstruction methods expressed as a reduced ability to detect important pathology like meniscal tears. Despite the common practice of quantitative reconstruction methodology evaluation with metrics such as SSIM, impaired pathology detection as an automated pathology-based reconstruction evaluation approach suggests existing quantitative methods do not capture clinically important reconstruction outcomes.

【2】 Predicting the Timing of Camera Movements From the Kinematics of Instruments in Robotic-Assisted Surgery Using Artificial Neural Networks
标题：利用人工神经网络从机器人辅助手术器械运动学预测摄像机运动时间
链接：https://arxiv.org/abs/2109.11192

作者：Hanna Kossowsky,Ilana Nisky
机构： and by the Helmsley Charitable Trust through theABC Robotics Initiative of Ben-Gurion University of Negev, Hanna Kossowsky and Ilana Nisky are from the Department of Biomed-ical Engineering and the Zlotowski Center for Neuroscience
摘要：机器人辅助手术对外科医生和患者都有好处，然而，外科医生经常需要调整内窥镜摄像头以获得良好的视角。同时控制摄像机和手术器械是不可能的，因此，这些摄像机调整会反复中断手术。自主摄像机控制有助于克服这一挑战，但大多数现有系统都是被动的，例如，让摄像机跟随手术器械。我们提出了一种使用人工神经网络预测摄像机何时移动的预测方法。我们使用了手术器械的运动学数据，这些数据是在猪模型的机器人辅助手术训练期间记录的。我们将数据分割为多个段，并将每个段标记为相机移动之前的一个段，或不移动的一个段。由于大类的不平衡性，我们训练了一组网络，每个网络都基于一个平衡的训练数据子集。我们发现，仪器的运动学数据可用于预测摄像机何时移动，并评估不同分段持续时间和集合大小下的性能。我们还研究了可以提前多少时间预测即将到来的摄像机运动，发现在摄像机运动发生前0.25、0.5和1秒预测摄像机运动，相对于预测即将到来的摄像机运动而言，准确率分别为98%、94%和84%。这表明可以提前预测摄像机运动事件，以便留出时间来计算和执行自主摄像机运动，并表明RAMIS的自主摄像机控制器有朝一日可能是可行的。
摘要：Robotic-assisted surgeries benefit both surgeons and patients, however, surgeons frequently need to adjust the endoscopic camera to achieve good viewpoints. Simultaneously controlling the camera and the surgical instruments is impossible, and consequentially, these camera adjustments repeatedly interrupt the surgery. Autonomous camera control could help overcome this challenge, but most existing systems are reactive, e.g., by having the camera follow the surgical instruments. We propose a predictive approach for anticipating when camera movements will occur using artificial neural networks. We used the kinematic data of the surgical instruments, which were recorded during robotic-assisted surgical training on porcine models. We split the data into segments, and labeled each either as a segment that immediately precedes a camera movement, or one that does not. Due to the large class imbalance, we trained an ensemble of networks, each on a balanced sub-set of the training data. We found that the instruments' kinematic data can be used to predict when camera movements will occur, and evaluated the performance on different segment durations and ensemble sizes. We also studied how much in advance an upcoming camera movement can be predicted, and found that predicting a camera movement 0.25, 0.5, and 1 second before they occurred achieved 98%, 94%, and 84% accuracy relative to the prediction of an imminent camera movement. This indicates that camera movement events can be predicted early enough to leave time for computing and executing an autonomous camera movement and suggests that an autonomous camera controller for RAMIS may one day be feasible.

【3】 A Profile-Based Binary Feature Extraction Method Using Frequent Itemsets for Improving Coronary Artery Disease Diagnosis
标题：一种改进冠心病诊断的基于轮廓的频繁项集二值特征提取方法
链接：https://arxiv.org/abs/2109.10966

作者：Ali Yavari,Amir Rajabzadeh,Fardin Abdali-Mohammadi
机构：Department of Electrical and Computer Engineering, Razi University, Kermanshah, Iran, A R T I C L E I N F O
备注：11 pages, 2 figures
摘要：近年来，人们对使用机器学习方法诊断冠心病（CAD）越来越感兴趣，以降低传统诊断的成本和健康影响。介绍了一种基于轮廓的二值特征提取（PBBFE）的CAD诊断方法。该方法在对数值特征进行划分后，利用Apriori算法提取频繁项集作为特征，以提高CAD诊断的准确性。提出的方法包括两个主要阶段。在第一阶段，根据年龄、性别和医疗状况为每个患者分配一个轮廓，然后根据分配的轮廓对所有数字特征进行离散化。然后，所有特征都要经过二值化过程，以便通过Apriori进行特征提取。在该阶段的最后一步，Apriori从数据集中提取频繁项集，并用于构建新的数据集。在第二阶段，利用遗传算法和支持向量机来识别提取特征的最佳子集进行分类。该方法在CAD领域最丰富的数据库Z-Alizadeh Sani数据集上进行了测试。在该数据集上进行的性能比较表明，该方法优于所有主要替代方法，准确率为98.35%，敏感性为100%，特异性为94.25%。所提出的方法在其他几个数据集上也达到了最高的精度。
摘要：Recent years have seen growing interest in the diagnosis of Coronary Artery Disease (CAD) with machine learning methods to reduce the cost and health implications of conventional diagnosis. This paper introduces a CAD diagnosis method with a novel feature extraction technique called the Profile-Based Binary Feature Extraction (PBBFE). In this method, after partitioning numerical features, frequent itemsets are extracted by the Apriori algorithm and then used as features to increase the CAD diagnosis accuracy. The proposed method consists of two main phases. In the first phase, each patient is assigned a profile based on age, gender, and medical condition, and then all numerical features are discretized based on assigned profiles. All features then undergo a binarization process to become ready for feature extraction by Apriori. In the last step of this phase, frequent itemsets are extracted from the dataset by Apriori and used to build a new dataset. In the second phase, the Genetic Algorithm and the Support Vector Machine are used to identify the best subset of extracted features for classification. The proposed method was tested on the Z-Alizadeh Sani dataset, which is one the richest databases in the field of CAD. Performance comparisons conducted on this dataset showed that the proposed method outperforms all major alternative methods with 98.35% accuracy, 100% sensitivity, and 94.25% specificity. The proposed method also achieved the highest accuracy on several other datasets.

蒸馏|知识提取(1篇)

【1】 Dynamic Knowledge Distillation for Pre-trained Language Models
标题：面向预训练语言模型的动态知识抽取
链接：https://arxiv.org/abs/2109.11295

作者：Lei Li,Yankai Lin,Shuhuai Ren,Peng Li,Jie Zhou,Xu Sun
机构：†MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University, §Pattern Recognition Center, WeChat AI, Tencent Inc., China
备注：Main Conference EMNLP 2021, Camera Ready
摘要：知识提取（KD）已被证明是压缩大规模预训练语言模型的有效方法。然而，现有方法静态地进行KD，例如，学生模型将其输出分布与预定义训练数据集上选定教师模型的输出分布对齐。在这篇论文中，我们探讨了一种动态的知识提炼是否能够使学生根据自己的能力调整学习过程，并考虑到学生的表现和学习效率。我们从教师模式采用、数据选择和KD目标适应三个方面探讨了动态调整。实验结果表明：（1）选择合适的教师模式可以提高学生模式的学习效果；（2）使用10%的信息量实例进行KD，在大大加快训练速度的同时，取得了相当的绩效；（3）通过调整不同对齐目标的监督贡献，可以提高学生的成绩。我们发现动态知识提取是有希望的，并讨论了未来更有效的知识发现方法的潜在方向。我们的代码可在https://github.com/lancopku/DynamicKD.
摘要：Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models. However, existing methods conduct KD statically, e.g., the student model aligns its output distribution to that of a selected teacher model on the pre-defined training dataset. In this paper, we explore whether a dynamic knowledge distillation that empowers the student to adjust the learning procedure according to its competency, regarding the student performance and learning efficiency. We explore the dynamical adjustments on three aspects: teacher model adoption, data selection, and KD objective adaptation. Experimental results show that (1) proper selection of teacher model can boost the performance of student model; (2) conducting KD with 10% informative instances achieves comparable performance while greatly accelerates the training; (3) the student performance can be boosted by adjusting the supervision contribution of different alignment objective. We find dynamic knowledge distillation is promising and provide discussions on potential future directions towards more efficient KD methods. Our code is available at https://github.com/lancopku/DynamicKD.

聚类(4篇)

【1】 Fast Density Estimation for Density-based Clustering Methods
标题：基于密度的聚类方法的快速密度估计
链接：https://arxiv.org/abs/2109.11383

作者：Difei Cheng,Ruihang Xu,Bo Zhang
机构： Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing , China and School of, LSEC and Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing , China and School
备注：8 pages, 17 figures, 1 table
摘要：基于密度的聚类算法在模式识别和机器学习中被广泛用于发现聚类，因为它们可以处理非超球形聚类，并且对异常值具有鲁棒性。然而，基于密度的算法在运行时主要依赖于寻找邻域和计算每个点的密度，这非常耗时。为了解决这一问题，本文提出了一种基于密度的聚类框架，该框架使用快速主成分分析，可以应用于基于密度的方法中，在查找邻居和估计密度时减少不必要的距离计算。将该聚类框架应用于基于密度的含噪应用空间聚类（DBSCAN）算法中，得到了一种改进的DBSCAN（称为IDBSCAN），它保留了DBSCAN的优点，同时大大减少了冗余距离的计算。在五个基准数据集上的实验表明，提出的IDBSCAN算法显著提高了计算效率。
摘要：Density-based clustering algorithms are widely used for discovering clusters in pattern recognition and machine learning since they can deal with non-hyperspherical clusters and are robustness to handle outliers. However, the runtime of density-based algorithms is heavily dominated by finding neighbors and calculating the density of each point which is time-consuming. To address this issue, this paper proposes a density-based clustering framework by using the fast principal component analysis, which can be applied to density based methods to prune unnecessary distance calculations when finding neighbors and estimating densities. By applying this clustering framework to the Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, an improved DBSCAN (called IDBSCAN) is obtained, which preserves the advantage of DBSCAN and meanwhile, greatly reduces the computation of redundant distances. Experiments on five benchmark datasets demonstrate that the proposed IDBSCAN algorithm improves the computational efficiency significantly.

【2】 A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels
标题：一种无参考标签的聚类和分类器评估框架
链接：https://arxiv.org/abs/2109.11126

作者：Robert J. Joyce,Edward Raff,Charles Nicholas
机构：Booz Allen Hamilton, University of Maryland, Baltimore, County, USA
备注：to appear in Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security
摘要：在一些问题空间中，获取地面真相标签的高成本要求使用低质量的参考数据集。使用这些数据集很难对模型性能进行基准测试，因为评估结果可能有偏差。我们提出了一个使用参考标签的补充，我们称之为近似地面真值精化（AGTR）。使用AGTR，我们证明了用于评估聚类算法和多类分类器的特定度量的边界可以在没有参考标签的情况下计算。我们还介绍了一种使用AGTR识别质量可疑数据集产生的不准确评估结果的程序。创建AGTR需要领域知识，恶意软件族分类是一项具有强大领域知识方法的任务，支持AGTR的构建。我们将AGTR评估框架应用于一个流行的恶意软件标记工具，以诊断先前测试中的过度拟合，并评估在先前数据下无法有效量化其影响的变化，从而展示我们的AGTR评估框架。
摘要：In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets. It is difficult to benchmark model performance using these datasets, as evaluation results may be biased. We propose a supplement to using reference labels, which we call an approximate ground truth refinement (AGTR). Using an AGTR, we prove that bounds on specific metrics used to evaluate clustering algorithms and multi-class classifiers can be computed without reference labels. We also introduce a procedure that uses an AGTR to identify inaccurate evaluation results produced from datasets of dubious quality. Creating an AGTR requires domain knowledge, and malware family classification is a task with robust domain knowledge approaches that support the construction of an AGTR. We demonstrate our AGTR evaluation framework by applying it to a popular malware labeling tool to diagnose over-fitting in prior testing and evaluate changes whose impact could not be meaningfully quantified under previous data.

【3】 Clustering performance analysis using new correlation based cluster validity indices
标题：使用新的基于相关性的聚类有效性指标进行聚类性能分析
链接：https://arxiv.org/abs/2109.11172

作者：Nathakhun Wiroonsri
机构：Mathematics and Statistics with Applications Research Group (MaSA), Department of Mathematics, King Mongkut’s University of Technology Thonburi
备注：20 pages
摘要：有各种聚类有效性度量用于评估聚类结果。使用这些度量的主要目的之一是寻求最优的未知聚类数。有些度量对于密度、大小和形状不同的簇很有效。然而，这些有效性度量的一个共同弱点是，它们有时只提供一个明确的最佳集群数量。这个数字实际上是未知的，用户可能希望根据不同的应用程序选择多个潜在的次优选项。我们基于一对数据点之间的实际距离和两个点所在的簇的质心距离之间的相关性，开发了两个新的簇有效性指数。我们提出的指数在不同数量的簇上不断产生几个峰值，克服了前面提到的缺点。此外，引入的相关性还可用于评估所选聚类结果的质量。为了将提出的有效性指标与几个著名的有效性指标进行比较，在不同的场景（包括著名的iris数据集和真实的营销应用程序）中进行了若干实验。
摘要：There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.

【4】 Quantile-based fuzzy C-means clustering of multivariate time series: Robust techniques
标题：基于分位数的多变量时间序列模糊C均值聚类：鲁棒技术
链接：https://arxiv.org/abs/2109.11027

作者：Ángel López-Oriona,Pierpaolo D'Urso,José Antonio Vilar,Borja Lafuente-Rego
机构：Research Group MODES, Research Center for Information and Communication, Technologies (CITIC), University of A Coru˜na, A Coru˜na, Spain., Technological Institute for Industrial Mathematics (ITMATI), Spain.
备注：arXiv admin note: text overlap with arXiv:2109.03728
摘要：从生成过程的角度，提出了三种稳健的多元时间序列聚类方法。这些程序是基于（i）分位数互谱密度估计和（ii）经典主成分分析的模糊C-均值模型的稳健版本。通过使用所谓的度量、噪声和修剪方法来实现对异常值存在的鲁棒性。度量方法在目标函数中纳入了一个距离度量，旨在中和离群值的影响，噪声方法构建了一个预期包含离群序列的人工聚类，修剪方法消除了数据集中最非典型的序列。所有提出的技术都继承了分位数互谱密度的优良特性，能够揭示一般类型的相关性。广泛的模拟研究（包括多元线性、非线性和GARCH过程）的结果表明，该算法在处理边远序列（即，表现出不同于大多数序列的依赖结构的序列）方面非常有效，显然是在执行替代程序。通过金融和环境系列的两个具体应用，强调了建议方法的有用性。
摘要：Three robust methods for clustering multivariate time series from the point of view of generating processes are proposed. The procedures are robust versions of a fuzzy C-means model based on: (i) estimates of the quantile cross-spectral density and (ii) the classical principal component analysis. Robustness to the presence of outliers is achieved by using the so-called metric, noise and trimmed approaches. The metric approach incorporates in the objective function a distance measure aimed at neutralizing the effect of the outliers, the noise approach builds an artificial cluster expected to contain the outlying series and the trimmed approach eliminates the most atypical series in the dataset. All the proposed techniques inherit the nice properties of the quantile cross-spectral density, as being able to uncover general types of dependence. Results from a broad simulation study including multivariate linear, nonlinear and GARCH processes indicate that the algorithms are substantially effective in coping with the presence of outlying series (i.e., series exhibiting a dependence structure different from that of the majority), clearly poutperforming alternative procedures. The usefulness of the suggested methods is highlighted by means of two specific applications regarding financial and environmental series.

自动驾驶|车辆|车道检测等(1篇)

【1】 PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation
标题：PredictionNet：面向规划、控制和仿真的实时联合概率交通预测
链接：https://arxiv.org/abs/2109.11094

作者：Alexey Kamenev,Lirui Wang,Ollin Boer Bohan,Ishwar Kulkarni,Bilal Kartal,Artem Molchanov,Stan Birchfield,David Nistér,Nikolai Smolyanskiy
机构：NVIDIA
备注：6 pages, 8 figures, submission to ICRA 2022 conference, for associated video file, see this https URL
摘要：预测交通代理的未来运动对于安全高效的自主驾驶至关重要。为此，我们提出了PredictionNet，这是一种深度神经网络（DNN），用于预测所有周围交通代理的运动以及ego车辆的运动。所有预测都是概率性的，并以简单的自上而下光栅化表示，该光栅化允许任意数量的代理。以具有车道信息的多层地图为条件，网络在一次通过中联合输出所有代理（包括ego车辆）的未来位置、速度和回溯向量。然后从输出中提取轨迹。该网络可以用来模拟真实的流量，并在流行的基准上产生有竞争力的结果。更重要的是，通过将其与运动规划/控制子系统相结合，它已被用于成功控制数百公里的真实世界车辆。该网络在嵌入式GPU上的运行速度比实时速度快，并且由于选择了输入表示，该系统显示出良好的泛化（跨感官模式和位置）。此外，我们还证明，通过使用强化学习（RL）扩展DNN，它可以更好地处理罕见或不安全的事件，如攻击性机动和碰撞。
摘要：Predicting the future motion of traffic agents is crucial for safe and efficient autonomous driving. To this end, we present PredictionNet, a deep neural network (DNN) that predicts the motion of all surrounding traffic agents together with the ego-vehicle's motion. All predictions are probabilistic and are represented in a simple top-down rasterization that allows an arbitrary number of agents. Conditioned on a multilayer map with lane information, the network outputs future positions, velocities, and backtrace vectors jointly for all agents including the ego-vehicle in a single pass. Trajectories are then extracted from the output. The network can be used to simulate realistic traffic, and it produces competitive results on popular benchmarks. More importantly, it has been used to successfully control a real-world vehicle for hundreds of kilometers, by combining it with a motion planning/control subsystem. The network runs faster than real-time on an embedded GPU, and the system shows good generalization (across sensory modalities and locations) due to the choice of input representation. Furthermore, we demonstrate that by extending the DNN with reinforcement learning (RL), it can better handle rare or unsafe events like aggressive maneuvers and crashes.

点云|SLAM|雷达|激光|深度RGBD相关(2篇)

【1】 Arbitrary-Depth Universal Approximation Theorems for Operator Neural Networks
标题：算子神经网络的任意深度泛逼近定理
链接：https://arxiv.org/abs/2109.11354

作者：Annan Yu,Chloé Becquey,Diana Halikias,Matthew Esmaili Mallory,Alex Townsend
机构： Cornell University, edu)§University of California
备注：12 pages
摘要：算子神经网络（NNs）的标准通用逼近定理适用于任意宽度和有界深度。在这里，我们证明了有界宽度和任意深度的算子NNs是连续非线性算子的通用逼近器。在我们的主要结果中，我们证明了对于在具有非零导数的点上连续可微的非多项式激活函数，可以构造一个宽度为5的算子NN，其输入为具有有限十进制表示的实数，任意接近于任何给定的连续非线性算子。我们得到了非仿射多项式激活函数的一个类似结果。我们还表明，通过构造深度为$2k^3+8$的算子ReLU NNs和常数宽度，深度具有理论上的优势，除非其宽度在$k$中呈指数形式，否则该算子ReLU NNs不能很好地近似于深度为$k$的算子ReLU NN。
摘要：The standard Universal Approximation Theorem for operator neural networks (NNs) holds for arbitrary width and bounded depth. Here, we prove that operator NNs of bounded width and arbitrary depth are universal approximators for continuous nonlinear operators. In our main result, we prove that for non-polynomial activation functions that are continuously differentiable at a point with a nonzero derivative, one can construct an operator NN of width five, whose inputs are real numbers with finite decimal representations, that is arbitrarily close to any given continuous nonlinear operator. We derive an analogous result for non-affine polynomial activation functions. We also show that depth has theoretical advantages by constructing operator ReLU NNs of depth $2k^3+8$ and constant width that cannot be well-approximated by any operator ReLU NN of depth $k$, unless its width is exponential in $k$.

【2】 Multi-resolution deep learning pipeline for dense large scale point clouds
标题：密集大比例尺点云的多分辨率深度学习流水线
链接：https://arxiv.org/abs/2109.11311

作者：Thomas Richard,Florent Dupont,Guillaume Lavoue
机构：CNRS, Université de Lyon, LIRIS, France
摘要：3D传感器的最新发展允许获取大规模场景中极其密集的3D点云。处理如此大的点云的主要挑战仍然是数据的大小，这会导致昂贵的计算和内存成本。在这种情况下，全分辨率云尤其难以处理，它带来的细节很少被利用。虽然细粒度细节对于检测小对象很重要，但它们可能会改变大型结构部件的局部几何结构，并误导深度学习网络。在本文中，我们介绍了一种新的通用深度学习管道，以利用大规模点云的全部精度，但仅适用于需要详细信息的对象。我们的方法的核心思想是将过程分成多个子网络，这些子网络以不同的分辨率运行，每个子网络都有其特定的类来检索。因此，管道允许每个类从子采样的噪声和内存成本降低或细粒度细节中获益。
摘要：Recent development of 3D sensors allows the acquisition of extremely dense 3D point clouds of large-scale scenes. The main challenge of processing such large point clouds remains in the size of the data, which induce expensive computational and memory cost. In this context, the full resolution cloud is particularly hard to process, and details it brings are rarely exploited. Although fine-grained details are important for detection of small objects, they can alter the local geometry of large structural parts and mislead deep learning networks. In this paper, we introduce a new generic deep learning pipeline to exploit the full precision of large scale point clouds, but only for objects that require details. The core idea of our approach is to split up the process into multiple sub-networks which operate on different resolutions and with each their specific classes to retrieve. Thus, the pipeline allows each class to benefit either from noise and memory cost reduction of a sub-sampling or from fine-grained details.

推理|分析|理解|解释(1篇)

【1】 Security Analysis of Capsule Network Inference using Horizontal Collaboration
标题：基于横向协作的胶囊网络推理安全性分析
链接：https://arxiv.org/abs/2109.11041

作者：Adewale Adeyemo,Faiq Khalid,Tolulope A. Odetola,Syed Rafay Hasan
机构：∗Department of Electrical and Computer Engineering, Tennessee Technological University, Cookeville, TN , USA, †Department of Computer Engineering, Vienna University of Technology, Wien, Austria
摘要：传统的卷积神经网络（CNN）存在毕加索效应和池层信息丢失等缺点。胶囊网络（CapsNet）是为了应对这些挑战而提出的，因为它的结构可以编码并保持输入图像的空间方向。与传统CNN类似，CapsNet也容易受到一些恶意攻击，正如一些研究人员在文献中所研究的那样。然而，这些研究大多集中于基于单个设备的推理，但最先进系统中的横向协作推理，如自动驾驶汽车中的智能边缘服务、语音控制系统和无人机，使大多数分析无效。水平协作意味着将经过训练的CNN模型或CNN任务划分到多个终端设备或边缘节点。因此，在水平协作环境中部署时，必须检查CapsNet对恶意攻击的鲁棒性。为此，我们研究了在水平协作环境中，当受到基于噪声的推理攻击时，CapsNet的鲁棒性。在本分析中，我们使用两种基于噪声的攻击，即高斯噪声攻击和FGSM噪声攻击，扰动了四个DNN模型不同层的特征图，即CapsNet、Mini VGG、LeNet和内部设计的CNN（ConvNet），其参数数量与CapsNet相同。实验结果表明，与传统的CNN相似，根据攻击者对DNN层的访问情况，CapsNet的分类精度显著下降。例如，当在CapsNet的DigitCap层执行高斯噪声攻击分类时，最大分类精度下降约为97%。
摘要：The traditional convolution neural networks (CNN) have several drawbacks like the Picasso effect and the loss of information by the pooling layer. The Capsule network (CapsNet) was proposed to address these challenges because its architecture can encode and preserve the spatial orientation of input images. Similar to traditional CNNs, CapsNet is also vulnerable to several malicious attacks, as studied by several researchers in the literature. However, most of these studies focus on single-device-based inference, but horizontally collaborative inference in state-of-the-art systems, like intelligent edge services in self-driving cars, voice controllable systems, and drones, nullify most of these analyses. Horizontal collaboration implies partitioning the trained CNN models or CNN tasks to multiple end devices or edge nodes. Therefore, it is imperative to examine the robustness of the CapsNet against malicious attacks when deployed in horizontally collaborative environments. Towards this, we examine the robustness of the CapsNet when subjected to noise-based inference attacks in a horizontal collaborative environment. In this analysis, we perturbed the feature maps of the different layers of four DNN models, i.e., CapsNet, Mini-VGG, LeNet, and an in-house designed CNN (ConvNet) with the same number of parameters as CapsNet, using two types of noised-based attacks, i.e., Gaussian Noise Attack and FGSM noise attack. The experimental results show that similar to the traditional CNNs, depending upon the access of the attacker to the DNN layer, the classification accuracy of the CapsNet drops significantly. For example, when Gaussian Noise Attack classification is performed at the DigitCap layer of the CapsNet, the maximum classification accuracy drop is approximately 97%.

检测相关(4篇)

【1】 LSTM Hyper-Parameter Selection for Malware Detection: Interaction Effects and Hierarchical Selection Approach
标题：用于恶意软件检测的LSTM超参数选择：交互效应和分层选择方法
链接：https://arxiv.org/abs/2109.11500

作者：Mohit Sewak,Sanjay K. Sahay,Hemant Rathore
机构：Microsoft R&D, India, BITS Pilani, Goa, India
备注：None
摘要：长短时记忆（LSTM）网络在基于人工智能（AI）的语言建模中显示出巨大的潜力。最近，LSTM网络也成为设计基于人工智能的入侵检测系统（IDS）的主流。然而，它在IDS中的适用性主要是在语言模型中使用的默认设置中进行研究的。然而，安全应用提供了独特的条件，因此在应用此类循环网络时需要仔细考虑。因此，我们对IDS的LSTM超参数进行了最详尽的研究之一，并对大约150个LSTM配置进行了实验，以确定其超参数的相对重要性、交互效应以及设计IDS的最佳选择方法。我们对这些实验的结果进行了多重分析，并对不同超参数协变量水平的交互作用进行了经验控制。我们发现，对于安全应用程序，特别是设计入侵检测系统，适用于语言模型的类似相对重要性都是无效的，超参数选择的标准线性方法也不是理想的。我们确定，相互作用效应在确定超参数的相对重要性方面起着至关重要的作用。我们还发现，在控制交互效应后，对于IDS来说，LSTM的正确相对重要性是批量大小，其次是退出率和填充。这些发现意义重大，因为当LSTM首次用于语言模型时，重点主要放在增加层的数量以提高性能上。
摘要：Long-Short-Term-Memory (LSTM) networks have shown great promise in artificial intelligence (AI) based language modeling. Recently, LSTM networks have also become popular for designing AI-based Intrusion Detection Systems (IDS). However, its applicability in IDS is studied largely in the default settings as used in language models. Whereas security applications offer distinct conditions and hence warrant careful consideration while applying such recurrent networks. Therefore, we conducted one of the most exhaustive works on LSTM hyper-parameters for IDS and experimented with approx. 150 LSTM configurations to determine its hyper-parameters relative importance, interaction effects, and optimal selection approach for designing an IDS. We conducted multiple analyses of the results of these experiments and empirically controlled for the interaction effects of different hyper-parameters covariate levels. We found that for security applications, especially for designing an IDS, neither similar relative importance as applicable to language models is valid, nor is the standard linear method for hyper-parameter selection ideal. We ascertained that the interaction effect plays a crucial role in determining the relative importance of hyper-parameters. We also discovered that after controlling for the interaction effect, the correct relative importance for LSTMs for an IDS is batch-size, followed by dropout ratio and padding. The findings are significant because when LSTM was first used for language models, the focus had mostly been on increasing the number of layers to enhance performance.

【2】 DeepAID: Interpreting and Improving Deep Learning-based Anomaly Detection in Security Applications
标题：DeepAID：对安全应用中基于深度学习的异常检测的解释和改进
链接：https://arxiv.org/abs/2109.11495

作者：Dongqi Han,Zhiliang Wang,Wenqi Chen,Ying Zhong,Su Wang,Han Zhang,Jiahai Yang,Xingang Shi,Xia Yin
机构：Institute for Network Sciences and Cyberspace, BNRist, Tsinghua University, Beijing, China, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, China
备注：Accepted by 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21)
摘要：无监督深度学习（DL）技术由于能够检测不可预见的威胁以及深度神经网络（DNN）提供的优越性能，已广泛应用于各种安全相关的异常检测应用中。然而，缺乏可解释性造成了在实践中采用DL模型的关键障碍。不幸的是，现有的解释方法是针对有监督学习模型和/或非安全领域提出的，这些方法不适用于无监督DL模型，并且不能满足安全领域的特殊要求。在本文中，我们提出了DeepAID，这是一个通用框架，旨在（1）解释安全领域中基于DL的异常检测系统，（2）基于这些解释提高这些系统的实用性。我们首先提出了一种新的无监督DNN解释方法，该方法通过制定和解决具有特定安全域约束的精心设计的优化问题。然后，我们提供了几个基于解释器的应用程序，以及一个基于模型的扩展蒸馏器，通过解决特定领域的问题来改进安全系统。我们将DeepAID应用于三种类型的安全相关异常检测系统，并使用具有代表性的先前工作广泛评估我们的解释器。实验结果表明，DeepAID能够在满足安全域特殊需求的同时，为无监督DL模型提供高质量的解释。我们还提供了几个用例来说明DeepAID可以帮助安全运营商理解模型决策、诊断系统错误、向模型提供反馈以及减少误报。
摘要：Unsupervised Deep Learning (DL) techniques have been widely used in various security-related anomaly detection applications, owing to the great promise of being able to detect unforeseen threats and superior performance provided by Deep Neural Networks (DNN). However, the lack of interpretability creates key barriers to the adoption of DL models in practice. Unfortunately, existing interpretation approaches are proposed for supervised learning models and/or non-security domains, which are unadaptable for unsupervised DL models and fail to satisfy special requirements in security domains. In this paper, we propose DeepAID, a general framework aiming to (1) interpret DL-based anomaly detection systems in security domains, and (2) improve the practicality of these systems based on the interpretations. We first propose a novel interpretation method for unsupervised DNNs by formulating and solving well-designed optimization problems with special constraints for security domains. Then, we provide several applications based on our Interpreter as well as a model-based extension Distiller to improve security systems by solving domain-specific problems. We apply DeepAID over three types of security-related anomaly detection systems and extensively evaluate our Interpreter with representative prior works. Experimental results show that DeepAID can provide high-quality interpretations for unsupervised DL models while meeting the special requirements of security domains. We also provide several use cases to show that DeepAID can help security operators to understand model decisions, diagnose system mistakes, give feedback to models, and reduce false positives.

【3】 An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series
标题：多变量时间序列中异常检测与诊断的评价
链接：https://arxiv.org/abs/2109.11428

作者：Astha Garg,Wenyu Zhang,Jules Samaran,Savitha Ramasamy,Chuan-Sheng Foo
机构：© , IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any, current or future media, including reprintingrepublishing this material for advertising or promotional purposes, creating new
备注：IEEE Transactions on Neural Networks and Learning Systems
摘要：最近，人们提出了几种多变量时间序列异常检测技术，但缺乏对一组常见数据集和度量的系统比较。本文对基于无监督和半监督深度学习的网络物理系统多元时间序列数据异常检测和诊断方法进行了系统和全面的评价。与以前的工作不同，我们通过一个由10个模型和4个评分函数组成的网格，改变模型错误的模型和后处理，即评分函数彼此独立，并将这些变量与最先进的方法进行比较。在时间序列异常检测中，检测异常事件比检测单个异常时间点更为重要。通过实验，我们发现现有的评价指标要么不考虑事件，要么不能区分好的检测器和普通检测器，如随机检测器或全正检测器。我们提出了一种新的度量来克服这些缺点，即用于评估时间序列异常检测的复合F分数（$Fc_1$）。我们的研究强调，在多元时间序列异常检测中，动态评分函数比静态评分函数更有效，评分函数的选择往往比基础模型的选择更重要。我们还发现，一个简单的、基于通道的模型——具有动态高斯评分函数的单变量全连接自动编码器，击败了最先进的算法，成为异常检测和诊断的最佳候选。
摘要：Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This paper presents a systematic and comprehensive evaluation of unsupervised and semi-supervised deep-learning based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e. the scoring functions independently of each other, through a grid of 10 models and 4 scoring functions, comparing these variants to state of the art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time-points. Through experiments, we find that the existing evaluation metrics either do not take events into account, or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score ($Fc_1$), for evaluating time-series anomaly detection. Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model - the Univariate Fully-Connected Auto-Encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state of the art algorithms.

【4】 Alzheimers Dementia Detection using Acoustic & Linguistic features and Pre-Trained BERT
标题：基于声学语言特征和预训练BERT的阿尔茨海默病检测
链接：https://arxiv.org/abs/2109.11010

作者：Akshay Valsaraj,Ithihas Madala,Nikhil Garg,Veeky Baths
机构：Cognitive Neuroscience Lab, BITS Pilani, K.K. Birla Goa Campus, Goa, India
摘要：阿尔茨海默病是一种致命的进行性脑部疾病，随着时间的推移而恶化。现在是时候我们有廉价和快速的临床诊断技术早期检测和护理。在以前的研究中，各种机器学习技术和预先训练的深度学习模型被用于各种声学和语言特征的提取。我们的研究集中于ADReSS（通过自发语音识别阿尔茨海默痴呆症）2021挑战中分类任务的三个模型。我们使用ADReSS挑战提供的平衡良好的数据集来训练和验证我们的模型。模型1使用来自eGeMAPs特征集的各种声学特征，模型2使用我们从自动生成的转录本生成的各种语言特征，模型3使用自动生成的转录本直接使用预先训练的BERT和TF-IDF提取特征。这些模型在模型部分中有详细描述。
摘要：Alzheimers disease is a fatal progressive brain disorder that worsens with time. It is high time we have inexpensive and quick clinical diagnostic techniques for early detection and care. In previous studies, various Machine Learning techniques and Pre-trained Deep Learning models have been used in conjunction with the extraction of various acoustic and linguistic features. Our study focuses on three models for the classification task in the ADReSS (The Alzheimers Dementia Recognition through Spontaneous Speech) 2021 Challenge. We use the well-balanced dataset provided by the ADReSS Challenge for training and validating our models. Model 1 uses various acoustic features from the eGeMAPs feature-set, Model 2 uses various linguistic features that we generated from auto-generated transcripts and Model 3 uses the auto-generated transcripts directly to extract features using a Pre-trained BERT and TF-IDF. These models are described in detail in the models section.

分类|识别(4篇)

【1】 Named Entity Recognition and Classification on Historical Documents: A Survey
标题：历史文献命名实体识别与分类研究综述
链接：https://arxiv.org/abs/2109.11406

作者：Maud Ehrmann,Ahmed Hamdi,Elvys Linhares Pontes,Matteo Romanello,Antoine Doucet
机构： University of La RochelleELVYS LINHARES PONTES, University of La RochelleMATTEO ROMANELLO, University of La RochelleAfter decades of massive digitisation
备注：39 pages
摘要：经过几十年的大规模数字化，史无前例的大量历史文件以数字格式提供，同时还有机器可读的文本。虽然这是在保存和可访问性方面向前迈出的一大步，但它也为内容挖掘带来了新的机遇，而下一个根本性挑战是开发适当的技术，以便从“过去的大数据”中高效地搜索、检索和探索信息。在语义索引的机会中，人文学者对命名实体的识别和分类需求很大。然而，命名实体识别（NER）系统面临着多样性、历史性和噪声输入的严峻挑战。在本次调查中，我们介绍了历史文件给NER带来的一系列挑战，清点了现有资源，描述了迄今为止部署的主要方法，并确定了未来发展的关键优先事项。
摘要：After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this 'big data of the past'. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.

【2】 Unbiased Loss Functions for Multilabel Classification with Missing Labels
标题：具有缺失标签的多标签分类的无偏损失函数
链接：https://arxiv.org/abs/2109.11282

作者：Erik Schultheis,Rohit Babbar
机构：Dept. of Computer Science at Aalto University, Konemiehentie , Espoo, Editor:
摘要：本文考虑在标签以已知的丢失率独立丢失的情况下，二进制和多标签分类问题。在极端多标签分类（XMC）任务中，标签缺失是一种普遍存在的现象，例如将维基百科文章与数十万个可能的标签中的一小部分进行匹配，在这类任务中，任何注释员都不可能检查所有阴性样本的有效性。因此，倾向评分精度（在已知噪声模型下对精度-at-k的无偏估计）已成为XMC中的标准度量之一。很少有方法在训练阶段已经考虑到这个问题，并且所有方法都局限于损失函数，这些损失函数可以分解为每个标签的贡献之和。一种典型的训练方法是将多标签问题简化为一系列二元或多类问题，并且已经证明，如果代理任务在优化召回方面应保持一致，则由此产生的损失函数不可在标签上分解。因此，本文导出了不同多标签约化的唯一无偏估计，包括不可分解的多标签约化。这些估计器的方差增大，可能导致不适定优化问题，我们通过切换到凸上界来解决这些问题。一项实验研究进一步补充了理论考虑，表明转向无偏估计显著改变了偏差-方差权衡，因此可能需要更强的正则化，这在某些情况下可能会否定无偏估计的好处。
摘要：This paper considers binary and multilabel classification problems in a setting where labels are missing independently and with a known rate. Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks, such as matching Wikipedia articles to a small subset out of the hundreds of thousands of possible tags, where no human annotator can possibly check the validity of all the negative samples. For this reason, propensity-scored precision -- an unbiased estimate for precision-at-k under a known noise model -- has become one of the standard metrics in XMC. Few methods take this problem into account already during the training phase, and all are limited to loss functions that can be decomposed into a sum of contributions from each individual label. A typical approach to training is to reduce the multilabel problem into a series of binary or multiclass problems, and it has been shown that if the surrogate task should be consistent for optimizing recall, the resulting loss function is not decomposable over labels. Therefore, this paper derives the unique unbiased estimators for the different multilabel reductions, including the non-decomposable ones. These estimators suffer from increased variance and may lead to ill-posed optimization problems, which we address by switching to convex upper-bounds. The theoretical considerations are further supplemented by an experimental study showing that the switch to unbiased estimators significantly alters the bias-variance trade-off and may thus require stronger regularization, which in some cases can negate the benefits of unbiased estimation.

【3】 Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora
标题：情景感知语音识别：Apollo Fearless Steps和CHAME-4语料库的进展
链接：https://arxiv.org/abs/2109.11086

作者：Szu-Jui Chen,Wei Xia,John H. L. Hansen
机构：Center for Robust Speech Systems (CRSS), University of Texas at Dallas, TX
备注：Accepted for ASRU 2021
摘要：在这项研究中，我们建议调查三重态丢失，以作为ASR的替代特征表示。我们考虑一个一般的非语义语音表示，它被训练为基于三重损失的称为颤音的自监督标准，用于声学建模来表示每个音频的声学特性。该策略随后应用于CHiME-4语料库和CRS UTDallas无畏步骤语料库，重点是100小时挑战语料库，该语料库由5个选定的NASA阿波罗-11频道组成。所提取的嵌入的分析提供了必要的基础来训练训练话语基于声学区分性质的不同组。此外，我们还证明了基于三重丢失的嵌入在声学建模中的性能优于i向量，证实了三重丢失比说话人特征更有效。通过语音和沉默概率建模等附加技术，再加上多风格训练，我们在无畏步骤语料库的开发和评估集上实现了+5.42%和+3.18%的相对WER改进。为了探索通用性，我们在CHiME-4的1通道轨道上进一步测试了相同的技术，并观察到真实测试数据的相对功率改善+11.90%。
摘要：In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature. With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5.42% and +3.18% relative WER improvement for the development and evaluation sets of the Fearless Steps Corpus. To explore generalization, we further test the same technique on the 1 channel track of CHiME-4 and observe a +11.90% relative WER improvement for real test data.

【4】 Robust Generalization of Quadratic Neural Networks via Function Identification
标题：基于函数辨识的二次神经网络鲁棒泛化
链接：https://arxiv.org/abs/2109.10935

作者：Kan Xu,Hamsa Bastani,Osbert Bastani
机构：University of Pennsylvania
摘要：深度学习面临的一个关键挑战是，神经网络通常对底层数据分布的变化不具有鲁棒性。我们从参数识别的统计概念的角度来研究这个问题。学习理论中的泛化界通常假设测试分布与训练分布接近。相反，如果我们能够识别“真实”参数，那么该模型将推广到任意分布位移。然而，神经网络通常过于参数化，使得参数识别变得不可能。我们证明，对于二次神经网络，我们可以识别模型所表示的函数，即使我们不能识别其参数。因此，即使在过参数化设置下，我们也可以获得鲁棒的泛化边界。我们利用这一结果，获得新的边界的上下文土匪和转移学习与二次神经网络。总的来说，我们的结果表明，我们可以通过设计能够代表真实数据生成过程的模型来提高神经网络的鲁棒性。在实践中，真实的数据生成过程往往非常复杂；因此，我们研究我们的框架如何连接到神经模块网络，神经模块网络被设计成将复杂任务分解为简单任务的组合。当单个神经模块可识别时，我们证明了鲁棒泛化界。
摘要：A key challenge facing deep learning is that neural networks are often not robust to shifts in the underlying data distribution. We study this problem from the perspective of the statistical concept of parameter identification. Generalization bounds from learning theory often assume that the test distribution is close to the training distribution. In contrast, if we can identify the "true" parameters, then the model generalizes to arbitrary distribution shifts. However, neural networks are typically overparameterized, making parameter identification impossible. We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters. Thus, we can obtain robust generalization bounds even in the overparameterized setting. We leverage this result to obtain new bounds for contextual bandits and transfer learning with quadratic neural networks. Overall, our results suggest that we can improve robustness of neural networks by designing models that can represent the true data generating process. In practice, the true data generating process is often very complex; thus, we study how our framework might connect to neural module networks, which are designed to break down complex tasks into compositions of simpler ones. We prove robust generalization bounds when individual neural modules are identifiable.

表征(3篇)

【1】 MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks
标题：MARMOT：构建视觉和语言任务多模态表征的深度学习框架
链接：https://arxiv.org/abs/2109.11526

作者：Patrick Y. Wu,Walter R. Mebane Jr
机构：Department of Political Science, University of Michigan, Walter R. Mebane, Jr., Department of Political Science and Department of Statistics, University of Michigan, Author Note, This work was supported in part by an NSF RIDIR grant under award number
备注：57 pages, 16 figures. Forthcoming in Computational Communication Research
摘要：社交媒体上的政治活动为了解政治行为提供了一个数据丰富的窗口，但海量数据意味着几乎所有社交媒体内容分析都需要数据标记步骤。然而，大多数自动机器分类方法忽略了发布内容的多模态，只关注文本或图像。最先进的视觉和语言模型在大多数政治科学研究中是不可用的：它们要求所有的观察都有图像和文本，并且需要昂贵的计算预训练。本文提出了一种新的视觉和语言框架，称为使用情态翻译的多模态表示（MARMOT）。MARMOT提出了两个方法论贡献：它可以为缺少图像或文本的观察构建表示，并用模态翻译代替计算上昂贵的预训练。在2016年美国大选期间报告选举事件的推特多标签分类中，MARMOT在20个类别中的19个类别中优于仅集成文本分类器。此外，旱獭在可恨的模因数据集上显示出比基准多模态模型的结果有显著改进，将VisualBERT的最佳结果集的准确性从0.6473提高到0.6760，接收器工作特征曲线（AUC）下的面积从0.7141提高到0.7530。
摘要：Political activity on social media presents a data-rich window into political behavior, but the vast amount of data means that almost all content analyses of social media require a data labeling step. However, most automated machine classification methods ignore the multimodality of posted content, focusing either on text or images. State-of-the-art vision-and-language models are unusable for most political science research: they require all observations to have both image and text and require computationally expensive pretraining. This paper proposes a novel vision-and-language framework called multimodal representations using modality translation (MARMOT). MARMOT presents two methodological contributions: it can construct representations for observations missing image or text, and it replaces the computationally expensive pretraining with modality translation. MARMOT outperforms an ensemble text-only classifier in 19 of 20 categories in multilabel classifications of tweets reporting election incidents during the 2016 U.S. general election. Moreover, MARMOT shows significant improvements over the results of benchmark multimodal models on the Hateful Memes dataset, improving the best result set by VisualBERT in terms of accuracy from 0.6473 to 0.6760 and area under the receiver operating characteristic curve (AUC) from 0.7141 to 0.7530.

【2】 Automated Feature-Topic Pairing: Aligning Semantic and Embedding Spaces in Spatial Representation Learning
标题：自动特征-主题配对：在空间表示学习中对齐语义和嵌入空间
链接：https://arxiv.org/abs/2109.11053

作者：Dongjie Wang,Kunpeng Liu,David Mohaisen,Pengyang Wang,Chang-Tien Lu,Yanjie Fu
机构：University of Central Florida, Orlando, Florida, United States, University of Macau, Macau, China, Virginia Tech, Virginia, United States
备注：SIGSPATIAL 2021
摘要：空间数据的自动表征是一种重要的地理智能。作为一种新兴的表征技术，空间表征学习（SRL）利用深度神经网络（DNN）学习空间数据的非线性嵌入特征进行表征。然而，SRL通过DNN的内部层提取特征，因此缺乏语义标签。另一方面，空间实体的文本提供潜在特征标签的语义理解，但对深层SRL模型不敏感。我们如何教SRL模型在文本中发现合适的主题标签，并将学习到的特征和标签配对？本文提出了一个新的问题：特征-主题配对，并提出了一种新的基于粒子群优化算法（PSO）的深度学习框架。具体来说，我们将特征主题配对问题转化为1）潜在嵌入特征空间和2）文本语义主题空间之间的自动对齐任务。我们将两个空间的对齐分解为：1）逐点对齐，表示主题分布和嵌入向量之间的相关性；2）成对对齐，表示特征相似度矩阵和主题相似度矩阵之间的一致性。我们设计了一个基于粒子群算法的求解器来同时选择一组最优的主题，并根据所选主题学习相应的特征。我们开发了一个闭环算法，在1）最小化表示重构和特征主题对齐的损失和2）搜索最佳主题之间迭代。最后，我们给出了大量的实验来证明我们方法的增强性能。
摘要：Automated characterization of spatial data is a kind of critical geographical intelligence. As an emerging technique for characterization, Spatial Representation Learning (SRL) uses deep neural networks (DNNs) to learn non-linear embedded features of spatial data for characterization. However, SRL extracts features by internal layers of DNNs, and thus suffers from lacking semantic labels. Texts of spatial entities, on the other hand, provide semantic understanding of latent feature labels, but is insensible to deep SRL models. How can we teach a SRL model to discover appropriate topic labels in texts and pair learned features with the labels? This paper formulates a new problem: feature-topic pairing, and proposes a novel Particle Swarm Optimization (PSO) based deep learning framework. Specifically, we formulate the feature-topic pairing problem into an automated alignment task between 1) a latent embedding feature space and 2) a textual semantic topic space. We decompose the alignment of the two spaces into: 1) point-wise alignment, denoting the correlation between a topic distribution and an embedding vector; 2) pair-wise alignment, denoting the consistency between a feature-feature similarity matrix and a topic-topic similarity matrix. We design a PSO based solver to simultaneously select an optimal set of topics and learn corresponding features based on the selected topics. We develop a closed loop algorithm to iterate between 1) minimizing losses of representation reconstruction and feature-topic alignment and 2) searching the best topics. Finally, we present extensive experiments to demonstrate the enhanced performance of our method.

【3】 An Exploration of Learnt Representations of W Jets
标题：关于W喷嘴习得表征的探讨
链接：https://arxiv.org/abs/2109.10919

作者：Jack H. Collins
机构：SLAC National Accelerator Laboratory
备注：9 pages, 4 figures. Extended version of submission to Neurips 2021 Machine Learning and the Physical Sciences workshop
摘要：我提出了一种基于对撞机物理数据（特别是增强的$W$喷流）的变分自动编码器（VAE），其重建误差由输入和输出喷流之间的地球移动器距离（EMD）的近似值给出。该VAE学习数据流形的具体表示，具有语义意义和可解释的潜在空间方向，这些方向根据其与潜在物理生成过程中的物理EMD尺度的关系进行分层组织。超参数$\beta$控制VAE对数据流形中的结构敏感的分辨率。潜在空间结构随$\beta$的变化，以及某些VAE属性的缩放，提供了对数据集的缩放依赖结构及其信息复杂性的深入了解。我将介绍两种学习表示的维度度量，这两种度量是根据这个比例计算出来的。
摘要：I present a Variational Autoencoder (VAE) trained on collider physics data (specifically boosted $W$ jets), with reconstruction error given by an approximation to the Earth Movers Distance (EMD) between input and output jets. This VAE learns a concrete representation of the data manifold, with semantically meaningful and interpretable latent space directions which are hierarchically organized in terms of their relation to physical EMD scales in the underlying physical generative process. A hyperparameter $\beta$ controls the resolution at which the VAE is sensitive to structures in the data manifold. The variation of the latent space structure with $\beta$, and the scaling of some VAE properties, provide insight into scale dependent structure of the dataset and its information complexity. I introduce two measures of the dimensionality of the learnt representation that are calculated from this scaling.

优化|敛散性(5篇)

【1】 Outlier-Robust Sparse Estimation via Non-Convex Optimization
标题：基于非凸优化的离群点稳健稀疏估计
链接：https://arxiv.org/abs/2109.11515

作者：Yu Cheng,Ilias Diakonikolas,Daniel M. Kane,Rong Ge,Shivam Gupta,Mahdi Soltanolkotabi
机构：University of Illinois at Chicago, University of Wisconsin-Madison, University of California, San Diego, Duke University, University of Texas at Austin, University of Southern California
摘要：我们探讨了稀疏约束下离群值鲁棒高维统计与非凸优化之间的联系，重点研究了鲁棒稀疏均值估计和鲁棒稀疏PCA的基本任务。我们为这些问题开发了新颖且简单的优化公式，使得相关优化问题的任何近似平稳点都能为基础稳健估计任务产生近似最优解。作为推论，我们得到任何一阶方法，有效地收敛到平稳产生一个有效的算法，这些任务。与以前的工作相比，得到的算法简单、实用，并且在更广泛的分布假设下成功。
摘要：We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA. We develop novel and simple optimization formulations for these problems such that any approximate stationary point of the associated optimization problem yields a near-optimal solution for the underlying robust estimation task. As a corollary, we obtain that any first-order method that efficiently converges to stationarity yields an efficient algorithm for these tasks. The obtained algorithms are simple, practical, and succeed under broader distributional assumptions compared to prior work.

【2】 Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces
标题：高维搜索空间上的多目标贝叶斯优化
链接：https://arxiv.org/abs/2109.10964

作者：Samuel Daulton,David Eriksson,Maximilian Balandat,Eytan Bakshy
机构：Facebook
摘要：在跨科学和工业的许多应用问题中，以高样本效率优化多个相互竞争的目标函数的能力是必不可少的。多目标贝叶斯优化（BO）在这类问题上取得了很好的实证效果，但即使在方法学上取得了最新进展，它也仅限于简单、低维的领域。大多数现有的BO方法在具有几十个以上参数的搜索空间上表现出较差的性能。在这项工作中，我们提出了MORBO，一种高维搜索空间上的多目标贝叶斯优化方法。MORBO在多个信任区域内同时执行局部贝叶斯优化，使其能够探索和识别不同的解决方案，即使目标函数难以全局建模。我们表明，MORBO显著提高了几个高维合成和现实世界多目标问题的样本效率，包括222个参数的车辆设计问题，证明MORBO是解决BO方法无法解决的具有挑战性和重要问题的实用方法。
摘要：The ability to optimize multiple competing objective functions with high sample efficiency is imperative in many applied problems across science and industry. Multi-objective Bayesian optimization (BO) achieves strong empirical performance on such problems, but even with recent methodological advances, it has been restricted to simple, low-dimensional domains. Most existing BO methods exhibit poor performance on search spaces with more than a few dozen parameters. In this work we propose MORBO, a method for multi-objective Bayesian optimization over high-dimensional search spaces. MORBO performs local Bayesian optimization within multiple trust regions simultaneously, allowing it to explore and identify diverse solutions even when the objective functions are difficult to model globally. We show that MORBO significantly advances the state-of-the-art in sample-efficiency for several high-dimensional synthetic and real-world multi-objective problems, including a vehicle design problem with 222 parameters, demonstrating that MORBO is a practical approach for challenging and important problems that were previously out of reach for BO methods.

【3】 Inequality Constrained Stochastic Nonlinear Optimization via Active-Set Sequential Quadratic Programming
标题：基于有效集序列二次规划的不等式约束随机非线性优化
链接：https://arxiv.org/abs/2109.11502

作者：Sen Na,Mihai Anitescu,Mladen Kolar
机构：Department of Statistics, The University of Chicago, Mathematics and Computer Science Division, Argonne National Laboratory, Booth School of Business, The University of Chicago
备注：61 pages, 7 figures
摘要：我们研究具有随机目标和确定性等式和不等式约束的非线性优化问题，这些问题出现在金融、制造、电力系统以及最近的深层神经网络等众多应用中。我们提出了一种主动集随机序列二次规划算法，使用可微的精确增广拉格朗日作为价值函数。该算法自适应地选择增广拉格朗日的惩罚参数，并进行随机线搜索以确定步长。建立了全局收敛性：对于任何初始化，KKT残差的“liminf”几乎肯定收敛到零。我们的算法和分析通过允许非线性不等式约束，进一步发展了先前的工作。我们展示了该算法在最可爱测试集中收集的非线性问题子集上的性能。
摘要：We study nonlinear optimization problems with stochastic objective and deterministic equality and inequality constraints, which emerge in numerous applications including finance, manufacturing, power systems and, recently, deep neural networks. We propose an active-set stochastic sequential quadratic programming algorithm, using a differentiable exact augmented Lagrangian as the merit function. The algorithm adaptively selects the penalty parameters of augmented Lagrangian and performs stochastic line search to decide the stepsize. The global convergence is established: for any initialization, the "liminf" of the KKT residuals converges to zero almost surely. Our algorithm and analysis further develop the prior work \cite{Na2021Adaptive} by allowing nonlinear inequality constraints. We demonstrate the performance of the algorithm on a subset of nonlinear problems collected in the CUTEst test set.

【4】 Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel Manifold
标题：基于Stiefel流形优化的快速高效MMD公平PCA算法
链接：https://arxiv.org/abs/2109.11196

作者：Junghyun Lee,Gwangsu Kim,Matt Olfat,Mark Hasegawa-Johnson,Chang D. Yoo
机构： Graduate School of AI, KAIST, Seoul, Republic of Korea, School of Electrical Engineering, KAIST, Daejeon, Republic of Korea, UC Berkeley IEOR, Berkely, CA, USA, Citadel, Chicago, IL, USA
备注：23 pages, 18 figures
摘要：本文将公平主成分分析（PCA）定义为最小化不同保护类的降维条件分布之间的最大平均差异（MMD）。MMD的加入自然会导致具有良好统计特性的精确且易于处理的公平性数学公式。我们将受MMD约束的公平PCA问题表述为Stiefel流形上的非凸优化问题，并使用黎曼精确惩罚平滑法解决该问题（REPMS；Liu和Boumal，2019）。重要的是，我们提供了局部最优性保证，并在实际环境中明确显示了每个超参数的理论效果，扩展了以前的结果。基于合成数据集和UCI数据集的实验比较表明，我们的方法在解释方差、公平性和运行时间方面优于先前的工作。
摘要：This paper defines fair principal component analysis (PCA) as minimizing the maximum mean discrepancy (MMD) between dimensionality-reduced conditional distributions of different protected classes. The incorporation of MMD naturally leads to an exact and tractable mathematical formulation of fairness with good statistical properties. We formulate the problem of fair PCA subject to MMD constraints as a non-convex optimization over the Stiefel manifold and solve it using the Riemannian Exact Penalty Method with Smoothing (REPMS; Liu and Boumal, 2019). Importantly, we provide local optimality guarantees and explicitly show the theoretical effect of each hyperparameter in practical settings, extending previous results. Experimental comparisons based on synthetic and UCI datasets show that our approach outperforms prior work in explained variance, fairness, and runtime.

【5】 Memory-Efficient Convex Optimization for Self-Dictionary Separable Nonnegative Matrix Factorization: A Frank-Wolfe Approach
标题：自字典可分非负矩阵分解的高效内存凸优化：Frank-Wolfe方法
链接：https://arxiv.org/abs/2109.11135

作者：Tri Nguyen,Xiao Fu,Ruiyuan Wu
机构： Wu is with the Department of Electronic Engineering
摘要：非负矩阵分解（NMF）通常依赖于可分离性条件来设计易于处理的算法。基于可分性的NMF主要由两种方法处理，即贪婪追求和凸规划。一个值得注意的凸NMF公式是所谓的自字典多测量向量（SD-MMV），它可以在不事先知道矩阵秩的情况下工作，并且相对于贪婪追踪，它对错误传播更具弹性。然而，凸面SD-MMV呈现出与问题大小成二次比例的大内存成本。这种内存挑战已经存在了十年，也是将凸面SD-MMV应用于大数据分析的主要障碍。本文提出了一种凸SD-MMV的内存有效算法。我们的算法利用了20世纪50年代经典算法的特殊更新规则，即Frank Wolfe（FW）算法。结果表明，在合理的条件下，FW算法解决了带噪声的SD-MMV问题，其内存开销随数据量线性增长。为了处理噪声较大的情况，提出了一种平滑的群稀疏正则化器，在保证低内存占用的同时提高了鲁棒性。该方法提出了第一个基于凸SD-MMV的NMF线性存储复杂度算法框架。该方法在两个无监督学习任务上进行了测试，即文本挖掘和社区检测，以展示其有效性和记忆效率。
摘要：Nonnegative matrix factorization (NMF) often relies on the separability condition for tractable algorithm design. Separability-based NMF is mainly handled by two types of approaches, namely, greedy pursuit and convex programming. A notable convex NMF formulation is the so-called self-dictionary multiple measurement vectors (SD-MMV), which can work without knowing the matrix rank a priori, and is arguably more resilient to error propagation relative to greedy pursuit. However, convex SD-MMV renders a large memory cost that scales quadratically with the problem size. This memory challenge has been around for a decade, and a major obstacle for applying convex SD-MMV to big data analytics. This work proposes a memory-efficient algorithm for convex SD-MMV. Our algorithm capitalizes on the special update rules of a classic algorithm from the 1950s, namely, the Frank-Wolfe (FW) algorithm. It is shown that, under reasonable conditions, the FW algorithm solves the noisy SD-MMV problem with a memory cost that grows linearly with the amount of data. To handle noisier scenarios, a smoothed group sparsity regularizer is proposed to improve robustness while maintaining the low memory footprint with guarantees. The proposed approach presents the first linear memory complexity algorithmic framework for convex SD-MMV based NMF. The method is tested over a couple of unsupervised learning tasks, i.e., text mining and community detection, to showcase its effectiveness and memory efficiency.

预测|估计(4篇)

【1】 Predicting Stress in Remote Learning via Advanced Deep Learning Technologies
标题：通过先进的深度学习技术预测远程学习中的压力
链接：https://arxiv.org/abs/2109.11076

作者：Daben Kyle Liu
机构：Belmont High School, Belmont, MA, USA
备注：6 pages, 6 figures, 1 equation
摘要：新冠病毒已经通过Zoom和Google Meet等在线会议软件推动大多数学校进行远程学习。尽管这一趋势有助于学生在不亲自上课的情况下继续学习，但它消除了教师用于有效教学的一个重要工具：视觉线索。由于看不清学生的脸，老师可能不会注意到学生何时需要帮助，或者学生何时没有注意。为了帮助教师应对这一挑战，本项目提出了一种基于机器学习的方法，为教师提供实时学生心理状态监控和分类，以便更好地进行远程教学。利用公开的脑电图（EEG）数据收集，本研究探索了四种不同的分类技术：经典的深度神经网络、传统流行的支持向量机、最新的卷积神经网络和最近流行的XGBoost模型。本研究定义了三种心理类型：投入式学习模式、困惑式学习模式和放松式学习模式。该项目的实验结果表明，这些分类器在脑电信号的心理状态分类中具有不同的潜力。虽然一些选定的分类器在一定延迟的情况下只能产生大约50%的准确率，但最好的分类器可以实时实现80%的准确率。这对于需要帮助进行远程教学调整的教师，以及许多其他不可能进行面对面交互的潜在应用，都是非常有益的。
摘要：COVID-19 has driven most schools to remote learning through online meeting software such as Zoom and Google Meet. Although this trend helps students continue learning without in-person classes, it removes a vital tool that teachers use to teach effectively: visual cues. By not being able to see a student's face clearly, the teacher may not notice when the student needs assistance, or when the student is not paying attention. In order to help remedy the teachers of this challenge, this project proposes a machine learning based approach that provides real-time student mental state monitoring and classifications for the teachers to better conduct remote teaching. Using publicly available electroencephalogram (EEG) data collections, this research explored four different classification techniques: the classic deep neural network, the traditionally popular support vector machine, the latest convolutional neural network, and the XGBoost model, which has gained popularity recently. This study defined three mental classes: an engaged learning mode, a confused learning mode, and a relaxed mode. The experimental results from this project showed that these selected classifiers have varying potentials in classifying EEG signals for mental states. While some of the selected classifiers only yield around 50% accuracy with some delay, the best ones can achieve 80% accurate classification in real-time. This could be very beneficial for teachers in need of help making remote teaching adjustments, and for many other potential applications where in-person interactions are not possible.

【2】 Learning Predictive and Interpretable Timeseries Summaries from ICU Data
标题：从ICU数据中学习可预测和可解释的时间序列摘要
链接：https://arxiv.org/abs/2109.11043

作者：Nari Johnson,Sonali Parbhoo,Andrew Slavin Ross,Finale Doshi-Velez
机构：School of Engineering and Applied Sciences, Harvard University, Cambridge, MA
备注：10 pages, 3 figures, AMIA 2021 Annual Symposium
摘要：机器学习模型跨时间利用患者数据（而不仅仅是最新的测量数据）提高了重症监护病房中许多风险分层任务的绩效。然而，这些模型中的许多及其学习到的表示都很复杂，因此临床医生很难解释，这给验证带来了挑战。我们的工作提出了一个新的程序来学习临床时间序列的总结，这些总结既具有预测性，又易于人类理解。具体而言，我们的总结包括临床数据的简单直观功能（如平均动脉压下降）。我们的学习总结优于传统的可解释模型类，并在医院死亡率分类任务中实现了与最先进的深度学习模型相当的性能。
摘要：Machine learning models that utilize patient data across time (rather than just the most recent measurements) have increased performance for many risk stratification tasks in the intensive care unit. However, many of these models and their learned representations are complex and therefore difficult for clinicians to interpret, creating challenges for validation. Our work proposes a new procedure to learn summaries of clinical time-series that are both predictive and easily understood by humans. Specifically, our summaries consist of simple and intuitive functions of clinical data (e.g. falling mean arterial pressure). Our learned summaries outperform traditional interpretable model classes and achieve performance comparable to state-of-the-art deep learning models on an in-hospital mortality classification task.

【3】 Social-Media Activity Forecasting with Exogenous Information Signals
标题：基于外生信息信号的社交媒体活跃度预测
链接：https://arxiv.org/abs/2109.11024

作者：Kin Wai Ng,Sameera Horawalavithana,Adriana Iamnitchi
机构：Computer Science & Eng., University of South Florida, Tampa, USA
摘要：由于社交媒体平台的广泛应用，它为研究和理解社会行为，特别是信息传播提供了一个理想的环境。对社交媒体活动进行建模具有许多实际意义，例如支持分析战略信息操作的工作，设计干预技术以减少虚假信息，或在救灾行动期间提供关键信息。在本文中，我们提出了一种建模技术，该技术通过使用外部信号（如新闻或武装冲突记录）和我们建模的社交媒体平台的内生数据来预测社交媒体活动的特定主题每日量。使用来自两个不同平台和两个不同环境的真实数据集进行实证评估，每个环境由多个相互关联的主题组成，证明了我们的解决方案的有效性。
摘要：Due to their widespread adoption, social media platforms present an ideal environment for studying and understanding social behavior, especially on information spread. Modeling social media activity has numerous practical implications such as supporting efforts to analyze strategic information operations, designing intervention techniques to mitigate disinformation, or delivering critical information during disaster relief operations. In this paper we propose a modeling technique that forecasts topic-specific daily volume of social media activities by using both exogenous signals, such as news or armed conflicts records, and endogenous data from the social media platform we model. Empirical evaluations with real datasets from two different platforms and two different contexts each composed of multiple interrelated topics demonstrate the effectiveness of our solution.

【4】 Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images
标题：利用合成计算机断层扫描(CT)图像改进结核病预测
链接：https://arxiv.org/abs/2109.11480

作者：Ashia Lewis,Evanjelin Mahmoodi,Yuyue Zhou,Megan Coffee,Elena Sizikova
机构：The University of Alabama, University of California, Santa Cruz, New York University, NYU Grossman School of Medicine
备注：Accepted to International Conference on Computer Vision (ICCV) 2021 Computer Vision for Automated Medical Diagnosis (CVAMD) Workshop
摘要：在医学图像分析中，利用放射学图像评估传染病过程是一项重要且具有挑战性的任务。肺部感染通常可以通过计算机断层扫描（CT）进行最佳成像和评估，这在低资源环境下通常不可用，危重患者很难获得。另一方面，X射线，一种不同类型的成像程序，价格低廉，通常可在床边使用，而且使用范围更广，但提供了一种更简单的二维图像。我们表明，通过依赖一个学习从X射线综合生成CT图像的模型，我们可以提高疾病自动分类的准确性，并为临床医生提供对肺部疾病过程的不同看法。具体来说，我们调查结核病（TB），一种主要影响肺部和其他器官系统的致命细菌传染病。我们发现，依靠合成生成的CT，结核病识别率提高了7.50%，结核病属性的识别率比X射线基线提高了12.16%。
摘要：The evaluation of infectious disease processes on radiologic images is an important and challenging task in medical image analysis. Pulmonary infections can often be best imaged and evaluated through computed tomography (CT) scans, which are often not available in low-resource environments and difficult to obtain for critically ill patients. On the other hand, X-ray, a different type of imaging procedure, is inexpensive, often available at the bedside and more widely available, but offers a simpler, two dimensional image. We show that by relying on a model that learns to generate CT images from X-rays synthetically, we can improve the automatic disease classification accuracy and provide clinicians with a different look at the pulmonary disease process. Specifically, we investigate Tuberculosis (TB), a deadly bacterial infectious disease that predominantly affects the lungs, but also other organ systems. We show that relying on synthetically generated CT improves TB identification by 7.50% and distinguishes TB properties up to 12.16% better than the X-ray baseline.

其他神经网络|深度学习|模型|建模(13篇)

【1】 Learning Dynamics from Noisy Measurements using Deep Learning with a Runge-Kutta Constraint
标题：基于Runge-Kutta约束的深度学习在噪声测量中的动态学习
链接：https://arxiv.org/abs/2109.11446

作者：Pawan Goyal,Peter Benner
机构：∗Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany., †Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.
摘要：测量噪声是收集物理过程数据时不可或缺的一部分。因此，去除噪声是从这些数据中得出结论的必要步骤，并且使用这些数据构建动力学模型变得非常必要。我们讨论了一种使用噪声和稀疏采样测量学习微分方程的方法。在我们的方法中，主要的创新之处在于深度神经网络与经典数值积分方法的集成。准确地说，我们的目标是学习一个隐式表示数据的神经网络和一个额外的神经网络，该网络对因变量的向量场进行建模。我们将这两个网络结合起来，通过强制执行以下约束，即下一时间步的数据可以通过以下数值积分格式（如四阶龙格-库塔格式）给出。所提出的学习模型预测向量场的框架在噪声测量下非常有效。该方法可以处理因变量在同一时间网格中不可用的情况。我们使用从各种微分方程获得的数据证明了所提出的方法对学习模型的有效性。所提出的方法为学习动态模型提供了一种很有前途的方法，其中第一原理的理解仍然是不透明的。
摘要：Measurement noise is an integral part while collecting data of a physical process. Thus, noise removal is a necessary step to draw conclusions from these data, and it often becomes quite essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and sparsely sampled measurements. In our methodology, the main innovation can be seen in of integration of deep neural networks with a classical numerical integration method. Precisely, we aim at learning a neural network that implicitly represents the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by enforcing the constraint that the data at the next time-steps can be given by following a numerical integration scheme such as the fourth-order Runge-Kutta scheme. The proposed framework to learn a model predicting the vector field is highly effective under noisy measurements. The approach can handle scenarios where dependent variables are not available at the same temporal grid. We demonstrate the effectiveness of the proposed method to learning models using data obtained from various differential equations. The proposed approach provides a promising methodology to learn dynamic models, where the first-principle understanding remains opaque.

【2】 Programming and Training Rate-Independent Chemical Reaction Networks
标题：与速率无关的化学反应网络的编程和训练
链接：https://arxiv.org/abs/2109.11422

作者：Marko Vasic,Cameron Chalk,Austin Luchsinger,Sarfraz Khurshid,David Soloveichik
机构：The University of Texas at Austin, USA
摘要：在与传统电子学不兼容的生化环境中嵌入计算有望在合成生物学、医学、纳米制造和其他领域产生广泛的影响。自然生化系统通常由化学反应网络（CRN）建模，CRN可作为合成化学计算的规范语言。在本文中，我们确定了一类称为非竞争（NC）的CRN，其平衡对反应速率和动力学速率定律绝对鲁棒，因为它们的行为完全由其化学计量结构捕获。与以前的速率无关CRN工作不同，检查非竞争性并将其用作设计标准很容易，并且保证了稳健的输出。我们还介绍了一种使用基础良好的深度学习方法编程NC CRN的技术，展示了从校正线性单元（ReLU）神经网络到NC CRN的转换过程。在二元重量ReLU网络的情况下，我们的转换过程出人意料地紧密，因为单个双分子反应对应于单个ReLU节点，反之亦然。这种紧凑性表明，神经网络可能是一种适合于编程速率无关的化学计算的范例。作为原理证明，我们通过对在传统机器学习数据集（IRIS和MNIST）上训练的神经网络翻译的CRN的数值模拟，以及与潜在生物应用（包括病毒检测和空间模式形成）更好地协调的任务，展示了我们的方案。
摘要：Embedding computation in biochemical environments incompatible with traditional electronics is expected to have wide-ranging impact in synthetic biology, medicine, nanofabrication and other fields. Natural biochemical systems are typically modeled by chemical reaction networks (CRNs), and CRNs can be used as a specification language for synthetic chemical computation. In this paper, we identify a class of CRNs called non-competitive (NC) whose equilibria are absolutely robust to reaction rates and kinetic rate law, because their behavior is captured solely by their stoichiometric structure. Unlike prior work on rate-independent CRNs, checking non-competition and using it as a design criterion is easy and promises robust output. We also present a technique to program NC-CRNs using well-founded deep learning methods, showing a translation procedure from rectified linear unit (ReLU) neural networks to NC-CRNs. In the case of binary weight ReLU networks, our translation procedure is surprisingly tight in the sense that a single bimolecular reaction corresponds to a single ReLU node and vice versa. This compactness argues that neural networks may be a fitting paradigm for programming rate-independent chemical computation. As proof of principle, we demonstrate our scheme with numerical simulations of CRNs translated from neural networks trained on traditional machine learning datasets (IRIS and MNIST), as well as tasks better aligned with potential biological applications including virus detection and spatial pattern formation.

【3】 A survey of Bayesian Network structure learning
标题：贝叶斯网络结构学习综述
链接：https://arxiv.org/abs/2109.11415

作者：Neville K. Kitson,Anthony C. Constantinou,Zhigao Guo,Yang Liu,Kiattikun Chobtham
机构：. Bayesian Artificial Intelligence research lab, Risk and Information Management (RIM) research group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London, UK, E,NS., (K. Chobtham).
摘要：在过去的几十年里，贝叶斯网络（BNs）作为一种在医学、生物学、流行病学、经济学和社会科学等不同领域进行不确定性推理的工具越来越流行。这在现实世界中尤其如此，在现实世界中，我们试图根据假设证据回答复杂问题，以确定干预行动。然而，确定BN的图形结构仍然是一个重大挑战，尤其是在因果假设下对问题建模时。这个问题的解决方案包括从数据中自动发现BN图、基于专家知识构建BN图或两者的结合。本文综述了用于从数据中学习BN结构的组合算法，描述了61种算法，包括典型的、成熟的和最先进的方法。每种算法的基本方法都以一致的术语描述，并强调了它们之间的相似性和差异。讨论了评估算法及其比较性能的方法，包括文献中声明的一致性。还介绍了处理真实世界数据集中的数据噪声以及将专家知识纳入学习过程的方法。
摘要：Bayesian Networks (BNs) have become increasingly popular over the last few decades as a tool for reasoning under uncertainty in fields as diverse as medicine, biology, epidemiology, economics and the social sciences. This is especially true in real-world areas where we seek to answer complex questions based on hypothetical evidence to determine actions for intervention. However, determining the graphical structure of a BN remains a major challenge, especially when modelling a problem under causal assumptions. Solutions to this problem include the automated discovery of BN graphs from data, constructing them based on expert knowledge, or a combination of the two. This paper provides a comprehensive review of combinatoric algorithms proposed for learning BN structure from data, describing 61 algorithms including prototypical, well-established and state-of-the-art approaches. The basic approach of each algorithm is described in consistent terms, and the similarities and differences between them highlighted. Methods of evaluating algorithms and their comparative performance are discussed including the consistency of claims made in the literature. Approaches for dealing with data noise in real-world datasets and incorporating expert knowledge into the learning process are also covered.

【4】 Energy efficient distributed analytics at the edge of the network for IoT environments
标题：物联网环境网络边缘的高能效分布式分析
链接：https://arxiv.org/abs/2109.11386

作者：Lorenzo Valerio,Marco Conti,Andrea Passarella
机构：Insitute of Informatics and Telematics, National Research Council, Via Moruzzi , Pisa. Italy
备注：None
摘要：由于个人移动和物联网设备的普及，许多“智能环境”（如智能城市和智能工厂）将成为海量数据的发生器。目前，这些数据的分析通常通过基于云的集中式服务实现。然而，根据许多研究，从数据所有权以及无线网络容量的角度来看，这种方法可能会带来重大问题。在本文中，我们利用雾计算范式将计算移到数据产生的地方。我们开发了一个著名的分布式机器学习框架（假设转移学习），并对经过物联网设备的移动节点以及网络基础设施边缘的光纤陀螺网关进行数据分析。我们分析了分布式学习框架不同配置的性能，包括（i）学习任务获得的准确性和（ii）在相关节点之间发送数据所花费的能量。具体而言，我们考虑参考无线技术，用于在不同类型的节点之间进行通信，例如LTE、NB-IOT、802.15.4、802.11等。我们的结果表明，通过移动节点收集数据并使用短程通信技术（例如802.15.4和802.11）执行分布式分析，允许将系统能耗大幅降低至$94\%$，而精确度损失w.r.t.集中式云解决方案高达$2\%$。
摘要：Due to the pervasive diffusion of personal mobile and IoT devices, many "smart environments" (e.g., smart cities and smart factories) will be, generators of huge amounts of data. Currently, analysis of this data is typically achieved through centralised cloud-based services. However, according to many studies, this approach may present significant issues from the standpoint of data ownership, as well as wireless network capacity. In this paper, we exploit the fog computing paradigm to move computation close to where data is produced. We exploit a well-known distributed machine learning framework (Hypothesis Transfer Learning), and perform data analytics on mobile nodes passing by IoT devices, in addition to fog gateways at the edge of the network infrastructure. We analyse the performance of different configurations of the distributed learning framework, in terms of (i) accuracy obtained in the learning task and (ii) energy spent to send data between the involved nodes. Specifically, we consider reference wireless technologies for communication between the different types of nodes we consider, e.g. LTE, Nb-IoT, 802.15.4, 802.11, etc. Our results show that collecting data through the mobile nodes and executing the distributed analytics using short-range communication technologies, such as 802.15.4 and 802.11, allows to strongly reduce the energy consumption of the system up to $94\%$ with a loss in accuracy w.r.t. a centralised cloud solution up to $2\%$.

【5】 Toward a Unified Framework for Debugging Gray-box Models
标题：走向调试灰箱模型的统一框架
链接：https://arxiv.org/abs/2109.11160

作者：Andrea Bontempelli,Fausto Giunchiglia,Andrea Passerini,Stefano Teso
机构：University of Trento, Trento, Italy
备注：11 pages, 1 figure
摘要：我们关注的是调试基于概念的灰盒模型（GBM）。这些模型获取输入中出现的与任务相关的概念，然后通过聚合概念激活来计算预测。这项工作源于以下观察：在GBMs中，概念和聚合功能都可能受到不同错误的影响，纠正这些错误需要不同类型的纠正监督。为此，我们将介绍一个简单的模式，用于识别这两个组件中的bug并确定其优先级，讨论可能的实现和开放性问题。同时，我们引入了一个新的损失函数，用于调试聚合步骤，该步骤扩展了现有方法，使模型的解释与GBMs保持一致，使其对概念在训练期间如何变化具有鲁棒性。
摘要：We are concerned with debugging concept-based gray-box models (GBMs). These models acquire task-relevant concepts appearing in the inputs and then compute a prediction by aggregating the concept activations. This work stems from the observation that in GBMs both the concepts and the aggregation function can be affected by different bugs, and that correcting these bugs requires different kinds of corrective supervision. To this end, we introduce a simple schema for identifying and prioritizing bugs in both components, discuss possible implementations and open problems. At the same time, we introduce a new loss function for debugging the aggregation step that extends existing approaches to align the model's explanations to GBMs by making them robust to how the concepts change during training.

【6】 Joint speaker diarisation and tracking in switching state-space model
标题：切换状态空间模型中的联合说话人跟踪
链接：https://arxiv.org/abs/2109.11140

作者：Jeremy H. M. Wong,Yifan Gong
机构：Microsoft, USA
摘要：在进行日记时，演讲者可能会四处走动。当使用麦克风阵列时，可以估计声音来源的瞬时位置，并且之前的研究表明，这些信息可以补充日记任务中的扬声器嵌入。然而，这些方法通常假设演讲者在整个会议中都相当静止。本文放松了这一假设，提出在统一的模型中明确跟踪说话人的动作，同时共同执行日记。提出了一种状态空间模型，其中隐藏状态表示当前活动说话人的身份和所有说话人的预测位置。该模型作为粒子滤波器实现。在Microsoft rich meeting转录任务上的实验表明，所提出的联合位置跟踪和分划方法能够与其他使用位置信息的方法进行比较。
摘要：Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be complementary to speaker embeddings in the diarisation task. However, these approaches often assume that speakers are fairly stationary throughout a meeting. This paper relaxes this assumption, by proposing to explicitly track the movements of speakers while jointly performing diarisation within a unified model. A state-space model is proposed, where the hidden state expresses the identity of the current active speaker and the predicted locations of all speakers. The model is implemented as a particle filter. Experiments on a Microsoft rich meeting transcription task show that the proposed joint location tracking and diarisation approach is able to perform comparably with other methods that use location information.

【7】 Learning to Downsample for Segmentation of Ultra-High Resolution Images
标题：超高分辨率图像分割中的降采样学习
链接：https://arxiv.org/abs/2109.11071

作者：Chen Jin,Ryutaro Tanno,Thomy Mertzanidou,Eleftheria Panagiotaki,Daniel C. Alexander
机构： Centre for Medical Image Computing, University College London, UK, Healthcare Intelligence, Microsoft Research Cambridge, UK
备注：19 pages, 17 figures
摘要：利用深度学习对超高分辨率图像进行分割是一项挑战，因为它们的尺寸很大，通常是数百万甚至数十亿像素。典型的解决方案是通过在所有空间位置以相同的密度进行采样，从而对图像进行均匀的降采样，以满足内存限制，隐含地假设所有像素同等重要。然而，这个假设是不正确的，它损害了深度学习技术的性能，而深度学习技术在标准尺寸的图像上被证明是强大的。例如，对于均匀下采样，参见图1中的绿色方框区域，当树木和建筑物过采样时，骑手和自行车没有足够的对应样本，这会对低分辨率下采样图像的分割预测产生负面影响。在这项工作中，我们表明，学习空间变化的下采样策略与分割相结合，在有限的计算预算下分割大型图像方面具有优势。图1显示，我们的方法在不同的位置调整采样密度，以便从较小的重要区域收集更多的样本，而从其他区域收集更少的样本，从而提高分割精度。我们在两个公共数据集和一个本地高分辨率数据集上显示，我们的方法一致地学习采样位置，保留了更多信息，并比基线方法提高了分割精度。
摘要：Segmentation of ultra-high resolution images with deep learning is challenging because of their enormous size, often millions or even billions of pixels. Typical solutions drastically downsample the image uniformly to meet memory constraints, implicitly assuming all pixels equally important by sampling at the same density at all spatial locations. However this assumption is not true and compromises the performance of deep learning techniques that have proved powerful on standard-sized images. For example with uniform downsampling, see green boxed region in Fig.1, the rider and bike do not have enough corresponding samples while the trees and buildings are oversampled, and lead to a negative effect on the segmentation prediction from the low-resolution downsampled image. In this work we show that learning the spatially varying downsampling strategy jointly with segmentation offers advantages in segmenting large images with limited computational budget. Fig.1 shows that our method adapts the sampling density over different locations so that more samples are collected from the small important regions and less from the others, which in turn leads to better segmentation accuracy. We show on two public and one local high-resolution datasets that our method consistently learns sampling locations preserving more information and boosting segmentation accuracy over baseline methods.

【8】 Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
标题：基于多实例GPU的DNN模型服务--以可重构机器调度问题为例
链接：https://arxiv.org/abs/2109.11067

作者：Cheng Tan,Zhichao Li,Jian Zhang,Yu Cao,Sikai Qi,Zherui Liu,Yibo Zhu,Chuanxiong Guo
机构：†ByteDance Inc., ⋆Northeastern University, ◦New York University
摘要：多实例GPU（MIG）是NVIDIA A100 GPU引入的一项新功能，它将一个物理GPU划分为多个GPU实例。有了MIG，A100可以成为有史以来为深度神经网络（DNN）服务的最经济高效的GPU。然而，发现最高效的GPU分区是一项挑战。根本问题是NP难问题；此外，这是一个新的抽象问题，我们将其定义为可重构机器调度问题（RMS）。本文研究了使用MIG（RMS的一种新情况）服务DNN。我们进一步提出了一个解决方案，即MIG服务。MIG-serving是一个算法管道，它融合了各种新设计的算法和定制的经典算法，包括启发式贪婪算法、遗传算法（GA）和蒙特卡罗树搜索算法（MCTS）。我们在库伯内特斯执行米格任务。我们的实验表明，与按原样使用A100相比，MIG服务可以在提供相同吞吐量的同时节省多达40%的GPU。
摘要：Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs that partitions one physical GPU into multiple GPU instances. With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs). However, discovering the most efficient GPU partitions is challenging. The underlying problem is NP-hard; moreover, it is a new abstract problem, which we define as the Reconfigurable Machine Scheduling Problem (RMS). This paper studies serving DNNs with MIG, a new case of RMS. We further propose a solution, MIG-serving. MIG- serving is an algorithm pipeline that blends a variety of newly designed algorithms and customized classic algorithms, including a heuristic greedy algorithm, Genetic Algorithm (GA), and Monte Carlo Tree Search algorithm (MCTS). We implement MIG-serving on Kubernetes. Our experiments show that compared to using A100 as-is, MIG-serving can save up to 40% of GPUs while providing the same throughput.

【9】 On Bonus-Based Exploration Methods in the Arcade Learning Environment
标题：论拱廊学习环境下的奖金探究法
链接：https://arxiv.org/abs/2109.11052

作者：Adrien Ali Taïga,William Fedus,Marlos C. Machado,Aaron Courville,Marc G. Bellemare
机构：MILA, Universit´e de Montr´eal, Google Brain
备注：None
摘要：强化学习中的探索研究，应用于Atari 2600游戏，强调解决困难的探索问题，如Montezuma的复仇（Bellemare et al.，2016）。最近，基于奖金的探索方法（通过增加环境奖励进行探索）在这些领域的表现已经超过了人类的平均水平。在本文中，我们在一个通用的评估框架内重新评估了流行的基于奖金的探索方法。我们将Rainbow（Hessel et al.，2018）与不同的探索奖金相结合，并评估其在Montezuma的复仇、Bellemare et al.的一套奖励稀少的艰苦探索游戏以及整个Atari 2600套件中的表现。我们发现，虽然探索奖金会使Montezuma的复仇获得更高的分数，但与更简单的$\epsilon$-贪婪方案相比，它们并没有提供有意义的收益。事实上，我们发现在该游戏中表现最好的方法在easy exploration Atari 2600游戏中的表现往往不如$\epsilon$-greedy。我们发现，即使为这些简单的探索游戏调整了超参数，我们的结论仍然有效。最后，我们发现，在Bellemare等人的艰苦探索游戏中，所有被调查的方法都没有受益于额外的训练样本（10亿帧，Rainbow的2亿帧）。我们的结果表明，最近蒙特祖马复仇的成果可能更好地归因于建筑变化，而不是更好的勘探方案；而Atari 2600游戏探索研究的实际进展速度可能被单一领域的良好结果所迷惑。
摘要：Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler $\epsilon$-greedy scheme. In fact, we find that methods that perform best on that game often underperform $\epsilon$-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow's 200 million) on Bellemare et al.'s hard exploration games. Our results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.

【10】 Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations
标题：从演示中学习，在受限环境中进行类人权衡
链接：https://arxiv.org/abs/2109.11018

作者：Arie Glazier,Andrea Loreggia,Nicholas Mattei,Taher Rahgooy,Francesca Rossi,K. Brent Venable
机构： Tulane University, New Orleans, LA, USA, European University Institute, Italy, University of West Florida, Pensacola, FL, USA, IBM Research, Yorktown Heights, NY, USA, Institute for Human and Machine Cognition, Pensacola, FL, USA
摘要：许多现实生活场景要求人类做出艰难的取舍：我们总是遵守所有交通规则还是在紧急情况下违反限速？这些情景迫使我们评估集体规范和个人目标之间的权衡。为了创建有效的人工智能-人类团队，我们必须为人工智能代理提供一个模型，说明人类如何在复杂、受限的环境中进行权衡。这些代理将能够反映人类的行为，或者将人类的注意力吸引到可以改进决策的情况。为此，我们提出了一种新的反向强化学习（IRL）方法，用于从演示中学习隐式硬约束和软约束，使代理能够快速适应新的设置。此外，学习对状态、动作和状态特征的软约束可以让代理将这些知识转移到共享类似方面的新领域。然后，我们使用约束学习方法来实现一种新的系统架构，该架构利用人类决策的认知模型，即多选择决策场理论（MDFT），来协调竞争目标。我们根据轨迹长度、违反约束的数量和总回报对生成的代理进行评估，证明我们的代理体系结构是通用的，并且实现了强大的性能。因此，当约束不明确时，我们能够从环境中的演示中捕获和复制类似于人类的权衡。
摘要：Many real-life scenarios require humans to make difficult trade-offs: do we always follow all the traffic rules or do we violate the speed limit in an emergency? These scenarios force us to evaluate the trade-off between collective norms and our own personal objectives. To create effective AI-human teams, we must equip AI agents with a model of how humans make trade-offs in complex, constrained environments. These agents will be able to mirror human behavior or to draw human attention to situations where decision making could be improved. To this end, we propose a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints from demonstrations, enabling agents to quickly adapt to new settings. In addition, learning soft constraints over states, actions, and state features allows agents to transfer this knowledge to new domains that share similar aspects. We then use the constraint learning method to implement a novel system architecture that leverages a cognitive model of human decision making, multi-alternative decision field theory (MDFT), to orchestrate competing objectives. We evaluate the resulting agent on trajectory length, number of violated constraints, and total reward, demonstrating that our agent architecture is both general and achieves strong performance. Thus we are able to capture and replicate human-like trade-offs from demonstrations in environments when constraints are not explicit.

【11】 The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence
标题：骆驼多场数据集：用人工智能学习宇宙的基本参数
链接：https://arxiv.org/abs/2109.10915

作者：Francisco Villaescusa-Navarro,Shy Genel,Daniel Angles-Alcazar,Leander Thiele,Romeel Dave,Desika Narayanan,Andrina Nicola,Yin Li,Pablo Villanueva-Domingo,Benjamin Wandelt,David N. Spergel,Rachel S. Somerville,Jose Manuel Zorrilla Matilla,Faizan G. Mohammad,Sultan Hassan,Helen Shao,Digvijay Wadekar,Michael Eickenberg,Kaze W. K. Wong,Gabriella Contardo,Yongseok Jo,Emily Moser,Erwin T. Lau,Luis Fernando Machado Poletti Valle,Lucia A. Perez,Daisuke Nagai,Nicholas Battaglia,Mark Vogelsberger
机构：Department of Astrophysical Sciences, Princeton University, Peyton Hall, Princeton NJ , USA, Center for Computational Astrophysics, Flatiron Institute,th Avenue, New York, NY, USA, Columbia Astrophysics Laboratory, Columbia University, New York, NY, USA
备注：17 pages, 1 figure. Third paper of a series of four. Hundreds of thousands of labeled 2D maps and 3D grids from thousands of simulated universes publicly available at this https URL
摘要：我们用机器学习模拟（CAMELS）多场数据集CMD展示了宇宙学和天体物理学，CMD是一个数十万2D地图和3D网格的集合，其中包含2000个不同模拟宇宙在几个宇宙时间的宇宙气体、暗物质和恒星的许多不同性质。2D地图和3D网格表示跨越1亿光年的宇宙区域，这些区域是由CAMELS项目中数千个最先进的流体力学和仅重力N体模拟生成的。CMD旨在训练机器学习模型，是同类数据中最大的数据集，包含超过70 TB的数据。在本文中，我们详细描述了CMD，并概述了它的一些应用。我们将注意力集中在这样一项任务上，即参数推断，将我们面临的问题表述为对社区的挑战。我们发布所有数据，并在https://camels-multifield-dataset.readthedocs.io.
摘要：We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine learning models, CMD is the largest dataset of its kind containing more than 70 Terabytes of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io.

【12】 Learning the noise fingerprint of quantum devices
标题：学习量子器件的噪声指纹
链接：https://arxiv.org/abs/2109.11405

作者：Stefano Martina,Lorenzo Buffoni,Stefano Gherardini,Filippo Caruso
机构：Department of Physics and Astronomy, University of Florence, Via Sansone, Sesto Fiorentino, I-, FI, Italy., European Laboratory for Non-Linear Spectroscopy (LENS), University of Florence, Via Nello Carrara , Sesto Fiorentino
备注：20 pages, 3 figures, 5 tables, research article
摘要：噪声源不可避免地影响任何量子技术设备。噪声的主要特征预计将严格取决于实现量子器件的物理平台，以可分辨指纹的形式。预计噪声源也会随着时间的推移而演变和变化。在这里，我们首先通过使用机器学习技术来识别IBM cloud可用量子计算机的噪声指纹，然后通过实验对其进行表征。机器学习技术旨在使用测量结果概率的时序序列对噪声分布进行分类。
摘要：Noise sources unavoidably affect any quantum technological device. Noise's main features are expected to strictly depend on the physical platform on which the quantum device is realized, in the form of a distinguishable fingerprint. Noise sources are also expected to evolve and change over time. Here, we first identify and then characterize experimentally the noise fingerprint of IBM cloud-available quantum computers, by resorting to machine learning techniques designed to classify noise distributions using time-ordered sequences of measured outcome probabilities.

【13】 Causal Discovery in High-Dimensional Point Process Networks with Hidden Nodes
标题：具有隐节点的高维点过程网络的因果发现
链接：https://arxiv.org/abs/2109.10947

作者：Xu Wang,Ali Shojaie
机构：Department of Biostatistics, University of Washington
备注：31 pages, 5 figures
摘要：由于技术进步导致近连续时间观测，新兴的多变量点过程数据为因果发现提供了新的机会。然而，实现这一目标的一个关键障碍是，在实践中可能没有观察到许多相关的过程。由于未经调整的混杂因素，忽略这些隐藏变量的天真估计方法可能会产生误导性结果。为了填补这一空白，我们提出了一种解发现方法，用以估计只观测到一部分节点的高维点过程网络。我们的方法允许在观察到的和未观察到的过程之间建立灵活的联系。它还允许未观测到的进程数量未知，并且可能大于观测到的节点数量。理论分析和数值研究突出了所提出的方法在识别观测过程之间因果关系方面的优势。
摘要：Thanks to technological advances leading to near-continuous time observations, emerging multivariate point process data offer new opportunities for causal discovery. However, a key obstacle in achieving this goal is that many relevant processes may not be observed in practice. Naive estimation approaches that ignore these hidden variables can generate misleading results because of the unadjusted confounding. To plug this gap, we propose a deconfounding procedure to estimate high-dimensional point process networks with only a subset of the nodes being observed. Our method allows flexible connections between the observed and unobserved processes. It also allows the number of unobserved processes to be unknown and potentially larger than the number of observed nodes. Theoretical analyses and numerical studies highlight the advantages of the proposed method in identifying causal interactions among the observed processes.

其他(15篇)

【1】 Multidimensional Scaling: Approximation and Complexity
标题：多维缩放：近似性和复杂性
链接：https://arxiv.org/abs/2109.11505

作者：Erik Demaine,Adam Hesterberg,Frederic Koehler,Jayson Lynch,John Urschel
机构： Paulson School of Engineering and Applied Sciences, Harvard University, USA 3Department of Mathematics, USA 4Cheriton School of Computer Science, University of Waterloo
摘要：度量多维缩放（MDS）是生成高维数据的有意义（非线性）低维嵌入的经典方法。MDS在统计、机器学习和图形绘制社区有着悠久的历史。特别是，Kamada-Kawai力有向图绘制方法相当于MDS，是将图嵌入低维的最常用方法之一。尽管MDS无处不在，但由于其目标函数高度非凸，我们对MDS的理论理解仍然有限。本文证明了最小化Kamada-Kawai目标是NP难的，并给出了一个可证明的近似算法来优化它，特别是在低直径图上的一个PTAS。我们用实验来补充这个结果，表明贪婪近似算法和基于梯度的方法之间可能存在的联系。
摘要：Metric Multidimensional scaling (MDS) is a classical method for generating meaningful (non-linear) low-dimensional embeddings of high-dimensional data. MDS has a long history in the statistics, machine learning, and graph drawing communities. In particular, the Kamada-Kawai force-directed graph drawing method is equivalent to MDS and is one of the most popular ways in practice to embed graphs into low dimensions. Despite its ubiquity, our theoretical understanding of MDS remains limited as its objective function is highly non-convex. In this paper, we prove that minimizing the Kamada-Kawai objective is NP-hard and give a provable approximation algorithm for optimizing it, which in particular is a PTAS on low-diameter graphs. We supplement this result with experiments suggesting possible connections between our greedy approximation algorithm and gradient-based methods.

【2】 Exploring Machine Teaching with Children
标题：与少年儿童一起探索机械教学
链接：https://arxiv.org/abs/2109.11434

作者：Utkarsh Dwivedi,Jaina Gandhi,Raj Parikh,Merijke Coenraad,Elizabeth Bonsignore,Hernisa Kacorri
机构： University of Maryland
备注：None
摘要：反复构建和测试机器学习模型可以帮助孩子们发展创造力、灵活性和机器学习和人工智能的舒适性。我们与14名儿童（7-13岁）和成人共同设计团队探讨儿童如何使用机器教学界面。孩子们训练图像分类器，并测试彼此的模型的鲁棒性。我们的研究阐明了儿童如何推理ML概念，为儿童设计机器教学体验提供了以下见解：（i）ML指标（如信心分数）应在实验中可见；（ii）ML活动应使儿童能够交流促进反思和模式识别的模式；以及（iii）接口应允许快速数据检查（例如图像与手势）。
摘要：Iteratively building and testing machine learning models can help children develop creativity, flexibility, and comfort with machine learning and artificial intelligence. We explore how children use machine teaching interfaces with a team of 14 children (aged 7-13 years) and adult co-designers. Children trained image classifiers and tested each other's models for robustness. Our study illuminates how children reason about ML concepts, offering these insights for designing machine teaching experiences for children: (i) ML metrics (e.g. confidence scores) should be visible for experimentation; (ii) ML activities should enable children to exchange models for promoting reflection and pattern recognition; and (iii) the interface should allow quick data inspection (e.g. images vs. gestures).

【3】 Robin Hood and Matthew Effects -- Differential Privacy Has Disparate Impact on Synthetic Data
标题：罗宾汉和马修效应--差异隐私对合成数据有不同的影响
链接：https://arxiv.org/abs/2109.11429

作者：Georgi Ganev,Bristena Oprisanu,Emiliano De Cristofaro
机构：UCL, Hazy, Alan Turing Institute
摘要：使用差异隐私（DP）训练的生成模型越来越多地用于以隐私友好的方式生成和共享合成数据。在本文中，我们着手分析DP对这些模型相对于代表性不足的类和数据子组的影响。我们从两个角度来做这件事：1）合成数据中类和子群的大小，以及2）它们的分类精度。我们还评估了不同程度的不平衡和隐私预算的影响。我们使用三种最先进的DP模型（Privayes、DP-WGAN和PATE-GAN）进行的实验表明，DP在生成的合成数据中产生相反的大小分布。更准确地说，它影响了多数和少数阶级及亚群体之间的差距，要么缩小差距（罗宾汉效应），要么扩大差距（马太效应）。然而，这两种大小变化都会对分类器的准确性产生类似的不同影响，对数据中代表性不足的子部分的影响更大。因此，在分析或训练一个基于合成数据的模型时，我们需要小心谨慎，否则可能会对不同的亚群体进行不均衡的处理，这也可能导致不可靠的结论。
摘要：Generative models trained using Differential Privacy (DP) are increasingly used to produce and share synthetic data in a privacy-friendly manner. In this paper, we set out to analyze the impact of DP on these models vis-a-vis underrepresented classes and subgroups of data. We do so from two angles: 1) the size of classes and subgroups in the synthetic data, and 2) classification accuracy on them. We also evaluate the effect of various levels of imbalance and privacy budgets. Our experiments, conducted using three state-of-the-art DP models (PrivBayes, DP-WGAN, and PATE-GAN), show that DP results in opposite size distributions in the generated synthetic data. More precisely, it affects the gap between the majority and minority classes and subgroups, either reducing it (a "Robin Hood" effect) or increasing it ("Matthew" effect). However, both of these size shifts lead to similar disparate impacts on a classifier's accuracy, affecting disproportionately more the underrepresented subparts of the data. As a result, we call for caution when analyzing or training a model on synthetic data, or risk treating different subpopulations unevenly, which might also lead to unreliable conclusions.

【4】 Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint
标题：反问题的随机归一化流：马尔可夫链观点
链接：https://arxiv.org/abs/2109.11375

作者：Paul Hagemann,Johannes Hertrich,Gabriele Steidl
摘要：为了克服拓扑约束并提高规范化流体系结构的表达能力，Wu，K\“ohler和Noe介绍了将确定性、可学习的流转换与随机抽样方法相结合的随机规范化流。在本文中，我们考虑随机归一化流动从马尔可夫链的观点。特别地，我们用一般的马尔可夫核代替转移密度，并通过Radon-Nikodym导数建立证明，该导数允许以合理的方式合并没有密度的分布。进一步，我们推广了反问题中后验分布抽样的结果。数值算例验证了所提出的条件随机归一化流的性能。
摘要：To overcome topological constraints and improve the expressiveness of normalizing flow architectures, Wu, K\"ohler and No\'e introduced stochastic normalizing flows which combine deterministic, learnable flow transformations with stochastic sampling methods. In this paper, we consider stochastic normalizing flows from a Markov chain point of view. In particular, we replace transition densities by general Markov kernels and establish proofs via Radon-Nikodym derivatives which allows to incorporate distributions without densities in a sound way. Further, we generalize the results for sampling from posterior distributions as required in inverse problems. The performance of the proposed conditional stochastic normalizing flow is demonstrated by numerical examples.

【5】 Federated Feature Selection for Cyber-Physical Systems of Systems
标题：系统网络物理系统的联合特征选择
链接：https://arxiv.org/abs/2109.11323

作者：Pietro Cassarà,Alberto Gotta,Lorenzo Valerio
摘要：自治系统生成大量多模式数据，这些数据在边缘收集和处理，以实现基于AI的服务。收集的数据集经过预处理，以提取信息性属性，称为特征，用于为AI算法提供信息。由于某些CP（如自动驾驶车辆）的计算和通信资源有限，因此从数据集中选择相关特征子集至关重要，以便改进学习方法所获得的结果并降低计算和通信成本。准确地说，特征选择是一种候选方法，它假设数据包含一定数量的冗余或不相关的属性，这些属性可以消除。在两个不同的数据集上取得的有希望的结果证实了我们方法的质量。在这项工作中，我们首次提出了一种适合分布式执行的联邦特征选择方法。准确地说，我们的结果表明，一组自动驾驶车辆在最佳功能集上达成了共识，它们利用这些功能将数据传输减少到99%，而信息损失可以忽略不计。
摘要：Autonomous systems generate a huge amount of multimodal data that are collected and processed on the Edge, in order to enable AI-based services. The collected datasets are pre-processed in order to extract informative attributes, called features, which are used to feed AI algorithms. Due to the limited computational and communication resources of some CPS, like autonomous vehicles, selecting the subset of relevant features from a dataset is of the utmost importance, in order to improve the result achieved by learning methods and to reduce computation and communication costs. Precisely, feature selection is the candidate approach, which assumes that data contain a certain number of redundant or irrelevant attributes that can be eliminated. The quality of our methods is confirmed by the promising results achieved on two different data sets. In this work, we propose, for the first time, a federated feature selection method suitable for being executed in a distributed manner. Precisely, our results show that a fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss.

【6】 Coded Computation across Shared Heterogeneous Workers with Communication Delay
标题：具有通信延迟的共享异构工作者间的编码计算
链接：https://arxiv.org/abs/2109.11246

作者：Yuxuan Sun,Fan Zhang,Junlin Zhao,Sheng Zhou,Zhisheng Niu,Deniz Gündüz
备注：Submitted to IEEE for possible publication
摘要：分布式计算使大规模计算任务能够在多个工作线程上并行处理。然而，工作人员之间的通信和计算延迟的随机性会导致离散效应，这可能会降低性能。编码计算有助于缓解离散效应，但冗余负载的数量及其分配给工人的方式应仔细优化。在这项工作中，我们考虑多主异构工人分布式计算场景，其中多个矩阵乘法任务被编码和分配给工人进行并行计算。目标是最小化最慢任务的通信和计算延迟。我们提出了在专用和部分工作分配策略下的工作分配、资源分配和负载分配算法，其中每个工作人员可以分别处理单个主机或多个主机的编码任务。然后，采用基于马尔可夫不等式的逼近、Karush-Kuhn-Tucker条件和逐次凸逼近方法求解非凸时延最小化问题。通过大量的仿真，我们发现与基准相比，所提出的算法可以减少任务完成延迟，并且观察到专用和部分工作分配策略具有不同的应用范围。
摘要：Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and their assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to workers for parallel computation. The goal is to minimize the communication plus computation delay of the slowest task. We propose worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks of either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks, and observe that dedicated and fractional worker assignment policies have different scopes of applications.

【7】 Secure PAC Bayesian Regression via Real Shamir Secret Sharing
标题：基于真实Shamir秘密共享的安全PAC贝叶斯回归
链接：https://arxiv.org/abs/2109.11200

作者：Jaron Skovsted Gundersen,Bulut Kuskonmaz,Rafael Wisniewski
摘要：机器学习的常用方法是使用大量的训练数据生成模型，以尽可能准确地预测测试数据实例。尽管如此，对数据隐私的担忧越来越多，但并非总能得到解决。我们提出了一个安全的协议来获得一个线性模型，该协议依赖于最近描述的实数秘密共享技术。我们以PAC贝叶斯界为出发点，从PAC贝叶斯界出发，推导出依赖于数据和先验的模型参数的闭合形式。要获得模型参数，需要求解一个线性系统。然而，我们考虑的情况下，各方持有不同的数据实例，他们不愿意放弃隐私的数据。因此，我们建议使用实数秘密共享和多方计算来共享数据，并在不侵犯数据隐私的情况下以安全的方式解决线性回归问题。我们建议两种方法；提出了逆方法和高斯消去法，并对这两种方法进行了比较。
摘要：Common approach of machine learning is to generate a model by using huge amount of training data to predict the test data instances as accurate as possible. Nonetheless, concerns about data privacy are increasingly raised, but not always addressed. We present a secure protocol for obtaining a linear model relying on recently described technique called real number secret sharing. We take as our starting point the PAC Bayesian bounds and deduce a closed form for the model parameters which depends on the data and the prior from the PAC Bayesian bounds. To obtain the model parameters one need to solve a linear system. However, we consider the situation where several parties hold different data instances and they are not willing to give up the privacy of the data. Hence, we suggest to use real number secret sharing and multiparty computation to share the data and solve the linear regression in a secure way without violating the privacy of data. We suggest two methods; an inverse method and a Gaussian elimination method, and compare these methods at the end.

【8】 Training Deep Spiking Auto-encoders without Bursting or Dying Neurons through Regularization
标题：用正则化方法训练无神经元爆裂或死亡的深棘波自动编码器
链接：https://arxiv.org/abs/2109.11045

作者：Justus F. Hübotter,Pablo Lanillos,Jakub M. Tomczak
机构：Donders Institute for Brain, Cognition and Behaviour, Department of Artificial Intelligence, Radboud University, the Netherlands, Department of Computer Science, Vrije Universiteit Amsterdam
备注：Under review
摘要：尖峰神经网络是计算神经科学中下一代大脑模型的一种很有前途的方法。此外，与经典的人工神经网络相比，它们可以在专门的神经形态硬件中实现快速计算，从而成为AI的节能部署。然而，训练深度尖峰神经网络，特别是以无监督的方式训练，是一项挑战，而且尖峰模型的性能受到死亡或爆裂神经元的显著阻碍。在这里，我们将基于膜电位的反向传播的端到端学习应用于具有多个可训练的漏积分和激发神经元层的尖峰卷积自动编码器。我们提出了仿生正则化方法来控制潜在表征中的尖峰密度。在实验中，我们发现对膜电位和脉冲输出应用正则化成功地避免了神经元死亡和破裂，并显著降低了脉冲自动编码器的重建误差。在MNIST数据集上训练正则化网络可产生与非尖峰基线模型（确定性和变分自动编码器）相当的图像重建质量，并表明比早期方法有所改进。重要的是，我们表明，与变分自动编码器不同，尖峰潜在表示显示与图像类相关的结构。
摘要：Spiking neural networks are a promising approach towards next-generation models of the brain in computational neuroscience. Moreover, compared to classic artificial neural networks, they could serve as an energy-efficient deployment of AI by enabling fast computation in specialized neuromorphic hardware. However, training deep spiking neural networks, especially in an unsupervised manner, is challenging and the performance of a spiking model is significantly hindered by dead or bursting neurons. Here, we apply end-to-end learning with membrane potential-based backpropagation to a spiking convolutional auto-encoder with multiple trainable layers of leaky integrate-and-fire neurons. We propose bio-inspired regularization methods to control the spike density in latent representations. In the experiments, we show that applying regularization on membrane potential and spiking output successfully avoids both dead and bursting neurons and significantly decreases the reconstruction error of the spiking auto-encoder. Training regularized networks on the MNIST dataset yields image reconstruction quality comparable to non-spiking baseline models (deterministic and variational auto-encoder) and indicates improvement upon earlier approaches. Importantly, we show that, unlike the variational auto-encoder, the spiking latent representations display structure associated with the image class.

【9】 Conditional Poisson Stochastic Beam Search
标题：条件泊松随机波束搜索
链接：https://arxiv.org/abs/2109.11034

作者：Clara Meister,Afra Amini,Tim Viera,Ryan Cotterell
机构：Tim Vieira, ETH Z¨urich, Johns Hopkins University, University of Cambridge
备注：None
摘要：波束搜索是NLP中许多序列生成任务的默认解码策略。该算法返回的近似K-最佳项集是许多应用程序分布的有用总结；然而，根据我们的模型，候选人通常表现出很高的重叠，并且可能会给出一个非常有偏差的预期估计。这些问题可以通过使用随机解码策略来解决。在这项工作中，我们提出了一种将波束搜索转化为随机过程的新方法：条件泊松随机波束搜索。我们没有在每次迭代中取最大值集，而是根据条件泊松抽样设计在不替换的情况下对K个候选样本进行抽样。我们认为这是Kool等人2019年的随机波束搜索（SBS）的更自然的替代方案。此外，我们还展示了在CPSBS设计下生成的样本如何用于从序列模型构建一致估计量和样本多样集。在我们的实验中，我们观察到CPSB比SBS产生更低的方差和更有效的估计量，甚至在高熵设置中也显示出改进。
摘要：Beam search is the default decoding strategy for many sequence generation tasks in NLP. The set of approximate K-best items returned by the algorithm is a useful summary of the distribution for many applications; however, the candidates typically exhibit high overlap and may give a highly biased estimate for expectations under our model. These problems can be addressed by instead using stochastic decoding strategies. In this work, we propose a new method for turning beam search into a stochastic process: Conditional Poisson stochastic beam search. Rather than taking the maximizing set at each iteration, we sample K candidates without replacement according to the conditional Poisson sampling design. We view this as a more natural alternative to Kool et. al. 2019's stochastic beam search (SBS). Furthermore, we show how samples generated under the CPSBS design can be used to build consistent estimators and sample diverse sets from sequence models. In our experiments, we observe CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.

【10】 Exploring Decomposition for Table-based Fact Verification
标题：基于表的事实验证的分解研究
链接：https://arxiv.org/abs/2109.11020

作者：Xiaoyu Yang,Xiaodan Zhu
机构：Ingenuity Labs Research Institute & ECE, Queen’s University, Canada
备注：Accepted by Findings of EMNLP 2021
摘要：基于结构化数据的事实验证具有挑战性，因为它需要模型理解在表上执行的自然语言和符号操作。尽管经过预训练的语言模型在验证简单语句方面表现出了强大的能力，但它们在处理涉及多个操作的复杂语句时遇到了困难。在本文中，我们通过将复杂语句分解为更简单的子问题来改进事实验证。利用弱监督语义分析器合成的程序，我们提出了一种程序引导的方法来构造用于分解模型训练的伪数据集。这些子问题及其预测答案可作为增强我们的事实验证模型的中间证据。实验表明，我们提出的方法在TabFact基准上达到了最新的性能，准确率为82.7%。
摘要：Fact verification based on structured data is challenging as it requires models to understand both natural language and symbolic operations performed over tables. Although pre-trained language models have demonstrated a strong capability in verifying simple statements, they struggle with complex statements that involve multiple operations. In this paper, we improve fact verification by decomposing complex statements into simpler subproblems. Leveraging the programs synthesized by a weakly supervised semantic parser, we propose a program-guided approach to constructing a pseudo dataset for decomposition model training. The subproblems, together with their predicted answers, serve as the intermediate evidence to enhance our fact verification model. Experiments show that our proposed approach achieves the new state-of-the-art performance, an 82.7\% accuracy, on the TabFact benchmark.

【11】 Generalisations and improvements of New Q-Newton's method Backtracking
标题：新Q-牛顿回溯法的推广与改进
链接：https://arxiv.org/abs/2109.11395

作者：Tuyen Trung Truong
备注：14 pages. arXiv admin note: text overlap with arXiv:2108.10249
摘要：在本文中，我们提出了一个新的Q-牛顿方法回溯算法的一般框架，该算法是在作者以前的工作中发展起来的。对于对称的平方实矩阵$a$，我们定义$minsp（a）：=\min{| | e | | |=1}| | Ae |$。给定$C^2$成本函数$f:\mathbb{R}^m\rightarrow\mathbb{R}$和实数$0}{A（x）e_i（x）| e|i（x）}e_i（x）$（我们也可以在需要时通过$w（x）/\max\{1，| w（x）{124; w（x）}$进行归一化）$\gamma（x）>0$通过回溯线搜索选择的学习率，以便满足Armijo的条件：$$f（x-\gamma（x）w（x））-f（x）\leq-\frac{1}\gamma x。$我们算法的更新规则是$x\mapsto H（x-\gamma。在新的Q-Newton方法回溯中，选择是$\tau=1+\alpha>1$和$e_1（x），\ldots，e_m（x）$是$\nabla^2f（x）$的特征向量。在本文中，我们允许更多的灵活性和通用性，例如$\tau$可以选择为$<1$或$e_1（x），\ldots，e_m（x）$不一定是$\nabla^2f（x）$的特征向量。新的Q-牛顿方法回溯（以及回溯梯度下降）是一个特例，一些版本具有拟牛顿方法的味道。有几个版本提供了良好的理论保证。给出了一个在多项式方程组求解中的应用。
摘要：In this paper, we propose a general framework for the algorithm New Q-Newton's method Backtracking, developed in the author's previous work. For a symmetric, square real matrix $A$, we define $minsp(A):=\min _{||e||=1} ||Ae||$. Given a $C^2$ cost function $f:\mathbb{R}^m\rightarrow \mathbb{R}$ and a real number $0}{||A(x)e_i(x)||}e_i(x);$$ (we can also normalise by $w(x)/\max \{1,||w(x)||\}$ when needed) $\gamma (x)>0$ learning rate chosen by Backtracking line search so that Armijo's condition is satisfied: $$f(x-\gamma (x)w(x))-f(x)\leq -\frac{1}{3}\gamma (x).$$ The update rule for our algorithm is $x\mapsto H(x)=x-\gamma (x)w(x)$. In New Q-Newton's method Backtracking, the choices are $\tau =1+\alpha >1$ and $e_1(x),\ldots ,e_m(x)$'s are eigenvectors of $\nabla ^2f(x)$. In this paper, we allow more flexibility and generality, for example $\tau$ can be chosen to be $<1$ or $e_1(x),\ldots ,e_m(x)$'s are not necessarily eigenvectors of $\nabla ^2f(x)$. New Q-Newton's method Backtracking (as well as Backtracking gradient descent) is a special case, and some versions have flavours of quasi-Newton's methods. Several versions allow good theoretical guarantees. An application to solving systems of polynomial equations is given.

【12】 Quantum algorithms for group convolution, cross-correlation, and equivariant transformations
标题：群卷积、互相关和等变变换的量子算法
链接：https://arxiv.org/abs/2109.11330

作者：Grecia Castelazo,Quynh T. Nguyen,Giacomo De Palma,Dirk Englund,Seth Lloyd,Bobak T. Kiani
机构：Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Scuola Normale Superiore, Pisa, Italy, Research Laboratory for Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
摘要：群卷积和互相关与群元素的作用相同，在数学中通常用于分析或利用给定问题设置中固有的对称性。在这里，我们提供了有效的量子算法，用于对存储为量子态的数据执行线性群卷积和互相关。我们算法的运行时在组的维度上是对数的，因此与经典算法相比，当输入数据作为量子态提供且线性操作条件良好时，可以提供指数加速。基于解决代数问题的量子算法的丰富文献，我们的理论框架为量化机器学习中的许多算法和采用群运算的数值方法开辟了一条道路。
摘要：Group convolutions and cross-correlations, which are equivariant to the actions of group elements, are commonly used in mathematics to analyze or take advantage of symmetries inherent in a given problem setting. Here, we provide efficient quantum algorithms for performing linear group convolutions and cross-correlations on data stored as quantum states. Runtimes for our algorithms are logarithmic in the dimension of the group thus offering an exponential speedup compared to classical algorithms when input data is provided as a quantum state and linear operations are well conditioned. Motivated by the rich literature on quantum algorithms for solving algebraic problems, our theoretical framework opens a path for quantizing many algorithms in machine learning and numerical methods that employ group operations.

【13】 ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
标题：信道增强：通过输入信道随机化训练改进多信道ASR的泛化
链接：https://arxiv.org/abs/2109.11225

作者：Marco Gaudesi,Felix Weninger,Dushyant Sharma,Puming Zhan
机构：Nuance Communications
备注：To appear in ASRU 2021
摘要：端到端（E2E）多通道ASR系统通过多通道前端与ASR模型的联合训练，在远场ASR任务中表现出最先进的性能。此类系统的主要限制是，它们通常使用来自固定阵列几何体的数据进行训练，这可能导致在测试中使用不同阵列时精度降低。这使得在实践中部署这些系统具有挑战性，因为针对各种阵列配置重新训练和部署不同型号的系统成本高昂。为了解决这个问题，我们提出了一种简单有效的数据增强技术，该技术基于训练期间多通道音频输入中的随机丢弃通道，以提高测试时对各种阵列配置的鲁棒性。我们称这种技术为ChannelAugment，与SpecAugment（SA）相反，SpecAugment（SA）丢弃单声道输入音频的时间和/或频率分量。我们将信道增强应用于空间滤波（SF）和最小方差无失真响应（MVDR）神经波束形成方法。对于SF，我们观察到使用不同数量话筒的各种阵列配置的功率提高了10.6%。对于MVDR，我们在不降低识别准确率的情况下实现了74%的训练时间减少。
摘要：End-to-end (E2E) multi-channel ASR systems show state-of-the-art performance in far-field ASR tasks by joint training of a multi-channel front-end along with the ASR model. The main limitation of such systems is that they are usually trained with data from a fixed array geometry, which can lead to degradation in accuracy when a different array is used in testing. This makes it challenging to deploy these systems in practice, as it is costly to retrain and deploy different models for various array configurations. To address this, we present a simple and effective data augmentation technique, which is based on randomly dropping channels in the multi-channel audio input during training, in order to improve the robustness to various array configurations at test time. We call this technique ChannelAugment, in contrast to SpecAugment (SA) which drops time and/or frequency components of a single channel input audio. We apply ChannelAugment to the Spatial Filtering (SF) and Minimum Variance Distortionless Response (MVDR) neural beamforming approaches. For SF, we observe 10.6% WER improvement across various array configurations employing different numbers of microphones. For MVDR, we achieve a 74% reduction in training time without causing degradation of recognition accuracy.

【14】 Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery
标题：秩过定鲁棒矩阵恢复：次梯度法与精确恢复
链接：https://arxiv.org/abs/2109.11154

作者：Lijun Ding,Liwei Jiang,Yudong Chen,Qing Qu,Zhihui Zhu
备注：75 pages, 3 figures
摘要：我们研究了从稀疏和严重破坏的高斯测量值中，在没有关于内在秩的先验知识的情况下，低秩矩阵的鲁棒恢复。我们考虑鲁棒矩阵分解方法。我们采用稳健的$\ellu_1$损失函数，并通过使用矩阵变量的超指定因子表示来处理未知秩的挑战。然后，我们解决相关的非凸非光滑问题，使用逐步减小的次梯度方法。我们证明了在感测矩阵和腐败的正则条件下（我们称之为受限方向保持性（RDPP）），即使秩指定过多，次梯度方法也以次线性速率收敛到精确的低秩解。此外，我们的结果更一般，因为一旦因子秩与未知秩匹配，它会自动加速到线性速率。另一方面，我们证明了RDPP条件在一般设置下成立，例如独立或对抗性稀疏破坏下的高斯测量，其中结果可能具有独立的意义。数值验证了所提出的次梯度法在超指定区域的精确恢复和收敛速度。此外，我们的实验进一步表明，我们的减小步长的特殊设计有效地防止了过参数化模型下鲁棒恢复的过度拟合，如鲁棒矩阵传感和鲁棒深度图像先验学习。这种正则化效应值得进一步研究。
摘要：We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust $\ell_1$ loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associated nonconvex nonsmooth problem using a subgradient method with diminishing stepsizes. We show that under a regularity condition on the sensing matrices and corruption, which we call restricted direction preserving property (RDPP), even with rank overspecified, the subgradient method converges to the exact low-rank solution at a sublinear rate. Moreover, our result is more general in the sense that it automatically speeds up to a linear rate once the factor rank matches the unknown rank. On the other hand, we show that the RDPP condition holds under generic settings, such as Gaussian measurements under independent or adversarial sparse corruptions, where the result could be of independent interest. Both the exact recovery and the convergence rate of the proposed subgradient method are numerically verified in the overspecified regime. Moreover, our experiment further shows that our particular design of diminishing stepsize effectively prevents overfitting for robust recovery under overparameterized models, such as robust matrix sensing and learning robust deep image prior. This regularization effect is worth further investigation.

【15】 Weighted Low Rank Matrix Approximation and Acceleration
标题：加权低秩矩阵逼近与加速
链接：https://arxiv.org/abs/2109.11057

作者：Elena Tuzhilina,Trevor Hastie
机构：Department of Statistics, Stanford University, Stanford, CA, and
摘要：低秩矩阵逼近是机器学习的核心概念之一，在降维、去噪、多元统计方法等方面有着广泛的应用。最近对LRMA的一个扩展称为低秩矩阵完成（LRMC）。它解决了某些观测值丢失时的LRMA问题，特别适用于推荐系统。本文考虑LRMA的元素加权加权推广。因此，所得到的加权低秩矩阵近似技术将LRMC作为具有二进制权重的特例来涵盖。WLRMA有许多应用。例如，它是GLM优化算法的重要组成部分，其中指数族用于对矩阵项进行建模，而自然参数矩阵允许低秩结构。我们提出了一种求解加权问题的算法，以及两种加速技术。此外，我们对所提出的算法进行了非奇异值分解修改，能够处理极高维的数据。我们在一个小的仿真实例和一个实际的数据应用程序上比较了所有方法的性能。
摘要：Low-rank matrix approximation is one of the central concepts in machine learning, with applications in dimension reduction, de-noising, multivariate statistical methodology, and many more. A recent extension to LRMA is called low-rank matrix completion (LRMC). It solves the LRMA problem when some observations are missing and is especially useful for recommender systems. In this paper, we consider an element-wise weighted generalization of LRMA. The resulting weighted low-rank matrix approximation technique therefore covers LRMC as a special case with binary weights. WLRMA has many applications. For example, it is an essential component of GLM optimization algorithms, where an exponential family is used to model the entries of a matrix, and the matrix of natural parameters admits a low-rank structure. We propose an algorithm for solving the weighted problem, as well as two acceleration techniques. Further, we develop a non-SVD modification of the proposed algorithm that is able to handle extremely high-dimensional data. We compare the performance of all the methods on a small simulation example as well as a real-data application.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递