机器学习学术速递[9.15]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计95篇

Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】 IGNNITION: Bridging the Gap Between Graph Neural Networks and Networking Systems
标题：IGNNITION：弥合图神经网络和网络系统之间的鸿沟
链接：https://arxiv.org/abs/2109.06715

作者：David Pujol-Perich,José Suárez-Varela,Miquel Ferriol,Shihan Xiao,Bo Wu,Albert Cabellos-Aparicio,Pere Barlet-Ros
机构：NOTE: Accepted for publication at IEEE Network Magazine. ©, IEEE. Personal use of this material is permitted., Permission from IEEE must be obtained for all other uses, in any current or future media, including reprintingrepublishing
备注：Accepted for publication at IEEE Network Magazine
摘要：近年来，图形神经网络（GNN）在数据结构为图形的许多领域（如化学、推荐系统）具有巨大的潜力。特别是，GNN在网络领域变得越来越流行，因为图形本质上存在于许多层次（例如拓扑、路由）。GNNs的主要创新之处在于，它能够推广到训练期间未见过的其他网络，这是开发实用的网络机器学习（ML）解决方案的一个基本特性。然而，实现一个功能性的GNN原型目前是一项繁重的任务，需要强大的神经网络编程技能。这对通常不具备必要的ML专业知识的网络工程师构成了一个重要障碍。在本文中，我们将介绍Ignition，这是一个新的开源框架，它支持为网络系统快速构建GNN原型。Ignition基于直观的高级抽象，它隐藏了GNN背后的复杂性，同时仍然提供了构建定制GNN体系结构的极大灵活性。为了展示这个框架的多功能性和性能，我们实现了两个应用于不同网络用例的最先进的GNN模型。我们的结果表明，Ignition生成的GNN模型在精度和性能方面与TensorFlow中的本机实现相当。
摘要：Recent years have seen the vast potential of Graph Neural Networks (GNN) in many fields where data is structured as graphs (e.g., chemistry, recommender systems). In particular, GNNs are becoming increasingly popular in the field of networking, as graphs are intrinsically present at many levels (e.g., topology, routing). The main novelty of GNNs is their ability to generalize to other networks unseen during training, which is an essential feature for developing practical Machine Learning (ML) solutions for networking. However, implementing a functional GNN prototype is currently a cumbersome task that requires strong skills in neural network programming. This poses an important barrier to network engineers that often do not have the necessary ML expertise. In this article, we present IGNNITION, a novel open-source framework that enables fast prototyping of GNNs for networking systems. IGNNITION is based on an intuitive high-level abstraction that hides the complexity behind GNNs, while still offering great flexibility to build custom GNN architectures. To showcase the versatility and performance of this framework, we implement two state-of-the-art GNN models applied to different networking use cases. Our results show that the GNN models produced by IGNNITION are equivalent in terms of accuracy and performance to their native implementations in TensorFlow.

【2】 Instance-wise Graph-based Framework for Multivariate Time Series Forecasting
标题：基于实例图的多变量时间序列预测框架
链接：https://arxiv.org/abs/2109.06489

作者：Wentao Xu,Weiqing Liu,Jiang Bian,Jian Yin,Tie-Yan Liu
机构：School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, Microsoft Research Asia, Beijing, China, School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China
摘要：多元时间序列预测因其在金融、交通、气象等不同领域的重要作用而受到越来越多的关注。近年来，人们提出了许多预测多元时间序列的研究工作。尽管以前的一些工作考虑了同一时间戳中不同变量之间的相互依赖关系，但现有工作忽略了不同时间戳中不同变量之间的相互连接。在本文中，我们提出了一个简单而有效的基于实例图的框架，以利用不同变量在不同时间戳的相互依赖性进行多元时间序列预测。我们框架的关键思想是将不同变量的历史时间序列中的信息聚合到我们需要预测的当前时间序列中。我们对交通、电力和汇率多元时间序列数据集进行了实验。结果表明，我们提出的模型优于最先进的基线方法。
摘要：The multivariate time series forecasting has attracted more and more attention because of its vital role in different fields in the real world, such as finance, traffic, and weather. In recent years, many research efforts have been proposed for forecasting multivariate time series. Although some previous work considers the interdependencies among different variables in the same timestamp, existing work overlooks the inter-connections between different variables at different time stamps. In this paper, we propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps for multivariate time series forecasting. The key idea of our framework is aggregating information from the historical time series of different variables to the current time series that we need to forecast. We conduct experiments on the Traffic, Electricity, and Exchange-Rate multivariate time series datasets. The results show that our proposed model outperforms the state-of-the-art baseline methods.

【3】 Deep Generative Models to Extend Active Directory Graphs with Honeypot Users
标题：利用蜜罐用户扩展Active Directory图的深度生成模型
链接：https://arxiv.org/abs/2109.06180

作者：Ondrej Lukas,Sebastian Garcia
机构：Czech Technical University, Prague, Czech Republic
备注：None
摘要：Active Directory（AD）是大型组织的关键元素，因为它在管理资源访问方面起着核心作用。由于组织中的所有用户都使用AD，因此很难检测到攻击者。我们建议在广告结构中生成并放置假用户（honeyusers），以帮助检测攻击。然而，并不是任何蜂蜜用户都会吸引攻击者。我们的方法通过一个可变的自动编码器生成蜂蜜用户，该编码器通过定位良好的蜂蜜用户丰富了广告结构。它首先学习原始节点和边缘在AD中的嵌入，然后使用改进的双向DAG-RNN对节点表示潜在空间的概率分布参数进行编码。最后，它对该分布中的节点进行采样，并使用MLP来确定节点的连接位置。该模型通过生成的广告与原始广告的相似性、新节点的位置、与GraphRNN的相似性来评估，最后通过让真实入侵者攻击生成的广告结构来查看他们是否选择了用户。结果表明，我们的机器学习模型能够很好地为现有的广告结构生成位置合适的用户，从而吸引入侵者。
摘要：Active Directory (AD) is a crucial element of large organizations, given its central role in managing access to resources. Since AD is used by all users in the organization, it is hard to detect attackers. We propose to generate and place fake users (honeyusers) in AD structures to help detect attacks. However, not any honeyuser will attract attackers. Our method generates honeyusers with a Variational Autoencoder that enriches the AD structure with well-positioned honeyusers. It first learns the embeddings of the original nodes and edges in the AD, then it uses a modified Bidirectional DAG-RNN to encode the parameters of the probability distribution of the latent space of node representations. Finally, it samples nodes from this distribution and uses an MLP to decide where the nodes are connected. The model was evaluated by the similarity of the generated AD with the original, by the positions of the new nodes, by the similarity with GraphRNN and finally by making real intruders attack the generated AD structure to see if they select the honeyusers. Results show that our machine learning model is good enough to generate well-placed honeyusers for existing AD structures so that intruders are lured into them.

Transformer(1篇)

【1】 Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments
标题：用于学习复杂多Agent环境中驾驶策略的视觉转换器
链接：https://arxiv.org/abs/2109.06514

作者：Eshagh Kargar,Ville Kyrki
机构： The driving policy uses 1Eshagh Kargar and Ville Kyrki are with School of Electrical Engineer-ing, Aalto University
摘要：在复杂的城市环境中驾驶是一项困难的任务，需要复杂的决策政策。为了做出明智的决策，需要了解长期环境和其他交通工具的重要性。在这项工作中，我们建议使用视觉转换器（ViT）来学习城市环境中的驾驶策略，并使用鸟瞰图（BEV）输入图像。ViT网络比早期提出的卷积神经网络（ConvNets）更有效地学习场景的全局上下文。此外，ViT的注意力机制有助于学习场景的注意力地图，从而让ego car确定哪些周围的车辆对其下一个决策很重要。我们证明了具有ViT主干的DQN代理以各种方式优于具有预先训练的ConvNet主干的基线算法。特别是，所提出的方法有助于强化学习算法学习速度更快，性能更高，数据更少。
摘要：Driving in a complex urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.

GAN|对抗|攻击|生成相关(12篇)

【1】 Automatic hippocampal surface generation via 3D U-net and active shape modeling with hybrid particle swarm optimization
标题：基于3DU网和混合粒子群优化主动形状建模的海马曲面自动生成
链接：https://arxiv.org/abs/2109.06817

作者：Pinyuan Zhong,Yue Zhang,Xiaoying Tang
机构： and the High-level University Fund (G0 2 2 3600 2), cn 1 Department of Electrical and Electronic Engineering, China 2 Department of Electrical and Electronic Engineering, The Universityof Hong Kong
摘要：在本文中，我们提出并验证了一种通过3D U-net结合主动形状建模（ASM）的海马表面生成的全自动流水线。拟议的管道主要包括三个步骤。首先，对于每一张磁共振图像，使用3D U-net在每个半球获得海马的自动分割。其次，对一组预先获得的模板曲面进行ASM，通过主成分分析生成平均形状和形状变化参数。最后，利用混合粒子群算法搜索与分割结果最匹配的形状变化参数。然后根据平均形状和形状变化参数生成海马表面。我们观察到，拟议的管道能够为两个半球的海马表面提供高精度、正确的解剖结构和足够的平滑度。
摘要：In this paper, we proposed and validated a fully automatic pipeline for hippocampal surface generation via 3D U-net coupled with active shape modeling (ASM). Principally, the proposed pipeline consisted of three steps. In the beginning, for each magnetic resonance image, a 3D U-net was employed to obtain the automatic hippocampus segmentation at each hemisphere. Secondly, ASM was performed on a group of pre-obtained template surfaces to generate mean shape and shape variation parameters through principal component analysis. Ultimately, hybrid particle swarm optimization was utilized to search for the optimal shape variation parameters that best match the segmentation. The hippocampal surface was then generated from the mean shape and the shape variation parameters. The proposed pipeline was observed to provide hippocampal surfaces at both hemispheres with high accuracy, correct anatomical topology, and sufficient smoothness.

【2】 PETGEN: Personalized Text Generation Attack on Deep Sequence Embedding-based Classification Models
标题：PETGEN：基于深度序列嵌入的分类模型个性化文本生成攻击
链接：https://arxiv.org/abs/2109.06777

作者：Bing He,Mustaque Ahamad,Srijan Kumar
机构：Georgia Institute of Technology, Atlanta, Georgia, USA
备注：Accepted for publication at: 2021 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'2021). Code and data at: this https URL
摘要：\textit{恶意用户应该写什么来愚弄检测模型？}识别恶意用户对于确保互联网平台的安全性和完整性至关重要。已经创建了几种基于深度学习的检测模型。然而，恶意用户可以通过操纵自己的行为来逃避深度检测模型，从而使这些模型几乎没有用处。这种深度检测模型在对抗性攻击方面的脆弱性尚不清楚。本文针对基于深度用户序列嵌入的分类模型创建了一种新的对抗攻击模型，该模型使用用户帖子序列生成用户嵌入并检测恶意用户。在攻击中，对手生成一个新帖子来愚弄分类器。我们提出了一种新的端到端个性化文本生成攻击模型，称为\texttt{PETGEN}，该模型同时降低了检测模型的效率，并生成具有若干关键期望属性的帖子。具体而言，\texttt{PETGEN}生成的帖子根据用户的写作风格进行个性化，了解给定的目标上下文，了解用户在目标上下文上的历史帖子，并封装用户最近的主题兴趣。我们在两个真实数据集（Yelp和Wikipedia，都有恶意用户的基本真相）上进行了广泛的实验，以表明\texttt{PETGEN}显著降低了流行的基于深度用户序列嵌入的分类模型的性能\texttt{PETGEN}在白盒和黑盒分类器设置中，在文本质量和攻击效率方面优于五个攻击基线。总的来说，这项工作为下一代敌方感知序列分类模型铺平了道路。
摘要：\textit{What should a malicious user write next to fool a detection model?} Identifying malicious users is critical to ensure the safety and integrity of internet platforms. Several deep learning based detection models have been created. However, malicious users can evade deep detection models by manipulating their behavior, rendering these models of little use. The vulnerability of such deep detection models against adversarial attacks is unknown. Here we create a novel adversarial attack model against deep user sequence embedding-based classification models, which use the sequence of user posts to generate user embeddings and detect malicious users. In the attack, the adversary generates a new post to fool the classifier. We propose a novel end-to-end Personalized Text Generation Attack model, called \texttt{PETGEN}, that simultaneously reduces the efficacy of the detection model and generates posts that have several key desirable properties. Specifically, \texttt{PETGEN} generates posts that are personalized to the user's writing style, have knowledge about a given target context, are aware of the user's historical posts on the target context, and encapsulate the user's recent topical interests. We conduct extensive experiments on two real-world datasets (Yelp and Wikipedia, both with ground-truth of malicious users) to show that \texttt{PETGEN} significantly reduces the performance of popular deep user sequence embedding-based classification models. \texttt{PETGEN} outperforms five attack baselines in terms of text quality and attack efficacy in both white-box and black-box classifier settings. Overall, this work paves the path towards the next generation of adversary-aware sequence classification models.

【3】 Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification
标题：专家知识引导的提案分类变长层次标签生成
链接：https://arxiv.org/abs/2109.06661

作者：Meng Xiao,Ziyue Qiao,Yanjie Fu,Yi Du,Pengyang Wang
机构：Computer Network Information Center, Chinese Academy of Sciences, Beijing, University of Chinese Academy of Sciences, Beijing, Department of Computer Science, University of Central Florida, Orlando, University of Macau, Macau
摘要：为了促进科学技术的发展，研究提案被提交给政府机构（如NSF）制定的公开法庭竞争项目。提案分类是实现有效和公平评审任务的最重要任务之一。提案分类旨在将提案分类为不同长度的标签序列。在本文中，我们将提案分类问题转化为一个分层的多标签分类任务。虽然已有一定的研究，但提案分类呈现出独特的特点：1）提案的分类结果是一个具有不同粒度级别的分层学科结构；2）建议书包含多种类型的文件；3）领域专家可以根据经验提供部分标签，这些标签可以用来提高任务性能。在本文中，我们致力于开发一个新的deep提案分类框架来联合建模这三个特性。特别是，为了顺序生成标签，我们利用先前生成的标签来预测下一级别的标签；为了整合来自专家的部分标签，我们使用这些经验部分标签的嵌入来初始化神经网络的状态。我们的模型能够自动识别标签序列的最佳长度，从而停止下一个标签预测。最后，我们给出了大量的结果来证明我们的方法可以联合建模标签序列中的部分标签、文本信息和语义依赖，从而获得更高的性能。
摘要：To advance the development of science and technology, research proposals are submitted to open-court competitive programs developed by government agencies (e.g., NSF). Proposal classification is one of the most important tasks to achieve effective and fair review assignments. Proposal classification aims to classify a proposal into a length-variant sequence of labels. In this paper, we formulate the proposal classification problem into a hierarchical multi-label classification task. Although there are certain prior studies, proposal classification exhibit unique features: 1) the classification result of a proposal is in a hierarchical discipline structure with different levels of granularity; 2) proposals contain multiple types of documents; 3) domain experts can empirically provide partial labels that can be leveraged to improve task performances. In this paper, we focus on developing a new deep proposal classification framework to jointly model the three features. In particular, to sequentially generate labels, we leverage previously-generated labels to predict the label of next level; to integrate partial labels from experts, we use the embedding of these empirical partial labels to initialize the state of neural networks. Our model can automatically identify the best length of label sequence to stop next label prediction. Finally, we present extensive results to demonstrate that our method can jointly model partial labels, textual information, and semantic dependencies in label sequences, and, thus, achieve advanced performances.

【4】 Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data
标题：有限流行病数据下稳健机器学习应用的条件合成数据生成
链接：https://arxiv.org/abs/2109.06486

作者：Hari Prasanna Das,Ryan Tran,Japjot Singh,Xiangyu Yue,Geoff Tison,Alberto Sangiovanni-Vincentelli,Costas J. Spanos
机构： Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Division of Cardiology, University of California, San Francisco (UCSF)
摘要：$\textbf{Background:}$在大流行（如新冠病毒-19）开始时，具有与新疾病对应的适当标记/属性的数据可能不可用或稀少。使用可用数据训练的机器学习（ML）模型数量有限，多样性差，常常会有偏差和不准确。同时，设计用于抗击流行病的ML算法必须具有良好的性能，并以时间敏感的方式开发。为了应对有限数据的挑战，并标记可用数据的稀缺性，我们建议生成条件合成数据，与真实数据一起用于开发健壮的ML模型$\textbf{Methods:}$我们提出了一个由条件生成流和条件合成数据生成分类器组成的混合模型。分类器对条件的特征表示进行解耦，然后将特征表示反馈给流以提取局部噪声。我们通过使用固定的条件特征表示操纵局部噪声来生成合成数据。我们还提出了一种半监督方法，在大多数可用数据没有标签的情况下生成合成样本$\textbf{Results:}$我们对正常、新冠病毒-19和肺炎患者的胸部计算机断层扫描（CT）进行了条件合成生成。我们表明，我们的方法在定性和定量性能上都明显优于现有的模型，并且我们的半监督方法可以有效地合成标签稀缺条件下的条件样本。作为合成数据下游使用的一个例子，我们展示了通过条件合成数据增强从CT扫描中检测新冠病毒-19的改进。
摘要：$\textbf{Background:}$ At the onset of a pandemic, such as COVID-19, data with proper labeling/attributes corresponding to the new disease might be unavailable or sparse. Machine Learning (ML) models trained with the available data, which is limited in quantity and poor in diversity, will often be biased and inaccurate. At the same time, ML algorithms designed to fight pandemics must have good performance and be developed in a time-sensitive manner. To tackle the challenges of limited data, and label scarcity in the available data, we propose generating conditional synthetic data, to be used alongside real data for developing robust ML models. $\textbf{Methods:}$ We present a hybrid model consisting of a conditional generative flow and a classifier for conditional synthetic data generation. The classifier decouples the feature representation for the condition, which is fed to the flow to extract the local noise. We generate synthetic data by manipulating the local noise with fixed conditional feature representation. We also propose a semi-supervised approach to generate synthetic samples in the absence of labels for a majority of the available data. $\textbf{Results:}$ We performed conditional synthetic generation for chest computed tomography (CT) scans corresponding to normal, COVID-19, and pneumonia afflicted patients. We show that our method significantly outperforms existing models both on qualitative and quantitative performance, and our semi-supervised approach can efficiently synthesize conditional samples under label scarcity. As an example of downstream use of synthetic data, we show improvement in COVID-19 detection from CT scans with conditional synthetic data augmentation.

【5】 Dodging Attack Using Carefully Crafted Natural Makeup
标题：使用精心制作的天然化妆品躲避攻击
链接：https://arxiv.org/abs/2109.06467

作者：Nitzan Guetta,Asaf Shabtai,Inderjeet Singh,Satoru Momiyama,Yuval Elovici
机构： Ben-Gurion University of the Negev, NEC Corporation
摘要：最先进的监控系统使用深度学习人脸识别模型来识别通过公共区域（如机场）的个人。先前的研究表明，在数字和物理领域，使用对抗性机器学习（AML）攻击成功地规避此类系统的识别。然而，物理域中的攻击需要对人类参与者的面部进行重大操纵，这可能会引起人类观察员（例如机场安检人员）的怀疑。在这项研究中，我们提出了一种新的黑盒AML攻击方法，该方法可以精心打造自然妆容，当应用于人类参与者时，可以防止参与者被面部识别模型识别。我们评估了我们针对ArcFace人脸识别模型提出的攻击，20名参与者参与了一个包括两个摄像头、不同拍摄角度和不同照明条件的真实世界设置。评估结果表明，在数字领域，人脸识别系统无法识别所有参与者，而在物理领域，人脸识别系统只能在1.22%的帧中识别参与者（相比之下，没有化妆的情况下为47.57%，随机自然化妆的情况下为33.73%），低于实际操作环境的合理阈值。
摘要：Deep learning face recognition models are used by state-of-the-art surveillance systems to identify individuals passing through public areas (e.g., airports). Previous studies have demonstrated the use of adversarial machine learning (AML) attacks to successfully evade identification by such systems, both in the digital and physical domains. Attacks in the physical domain, however, require significant manipulation to the human participant's face, which can raise suspicion by human observers (e.g. airport security officers). In this study, we present a novel black-box AML attack which carefully crafts natural makeup, which, when applied on a human participant, prevents the participant from being identified by facial recognition models. We evaluated our proposed attack against the ArcFace face recognition model, with 20 participants in a real-world setup that includes two cameras, different shooting angles, and different lighting conditions. The evaluation results show that in the digital domain, the face recognition system was unable to identify all of the participants, while in the physical domain, the face recognition system was able to identify the participants in only 1.22% of the frames (compared to 47.57% without makeup and 33.73% with random natural makeup), which is below a reasonable threshold of a realistic operational environment.

【6】 Structure-Enhanced Pop Music Generation via Harmony-Aware Learning
标题：基于和声学习的结构增强型流行音乐生成
链接：https://arxiv.org/abs/2109.06441

作者：Xueyao Zhang,Jinchao Zhang,Yao Qiu,Li Wang,Jie Zhou
机构： Institute of Computing Technology, Chinese Academy of Sciences, Pattern Recognition Center, WeChat AI, Tencent Inc, China, Communication University of China, University of Chinese Academy of Sciences
备注：Under review
摘要：自动创作具有令人满意结构的流行音乐是一个有吸引力但富有挑战性的话题。虽然音乐结构很容易被人类感知，但很难被清晰地描述和准确地定义。如何对流行音乐的生成结构进行建模，这一问题还远未解决。在本文中，我们建议利用和声感知学习来生成结构增强的流行音乐。一方面，和声的参与者之一和弦代表了多个音符的和声集，它与音乐的空间结构、纹理紧密结合。另一方面，和声的另一个参与者，和弦进行，通常伴随着音乐的发展，这促进了音乐的时间结构，形式。此外，当和弦演变为和弦进行时，质感和形式可以通过和声自然衔接，这有助于两种结构的共同学习。此外，我们提出了和声感知的分层音乐转换器（HAT），它可以从音乐中自适应地利用结构，并在多个层次上与音乐标记交互，以增强各种音乐元素中结构的信号。主观和客观评价的结果表明，HAT显著提高了生成音乐的质量，尤其是在结构上。
摘要：Automatically composing pop music with a satisfactory structure is an attractive but challenging topic. Although the musical structure is easy to be perceived by human, it is difficult to be described clearly and defined accurately. And it is still far from being solved that how we should model the structure in pop music generation. In this paper, we propose to leverage harmony-aware learning for structure-enhanced pop music generation. On the one hand, one of the participants of harmony, chord, represents the harmonic set of multiple notes, which is integrated closely with the spatial structure of music, texture. On the other hand, the other participant of harmony, chord progression, usually accompanies with the development of the music, which promotes the temporal structure of music, form. Besides, when chords evolve into chord progression, the texture and the form can be bridged by the harmony naturally, which contributes to the joint learning of the two structures. Furthermore, we propose the Harmony-Aware Hierarchical Music Transformer (HAT), which can exploit the structure adaptively from the music, and interact on the music tokens at multiple levels to enhance the signals of the structure in various musical elements. Results of subjective and objective evaluations demonstrate that HAT significantly improves the quality of generated music, especially in the structureness.

【7】 Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
标题：压缩、转换和创建：评估自然语言生成的统一框架
链接：https://arxiv.org/abs/2109.06379

作者：Mingkai Deng,Bowen Tan,Zhengzhong Liu,Eric P. Xing,Zhiting Hu
机构：Carnegie Mellon University, Petuum Inc., MBZUAI, UC San Diego, (∗equal contribution)
备注：EMNLP 2021, Code available at this https URL
摘要：自然语言生成（Natural language generation，NLG）跨越了广泛的任务，每个任务都有特定的目标，并要求生成的文本具有不同的属性。这种复杂性使得NLG的自动评估尤其具有挑战性。以前的工作通常集中在单个任务上，并根据特定直觉制定了单独的评估指标。在本文中，我们根据NLG任务中信息变化的性质提出了一个统一的观点，包括压缩（如摘要）、转换（如文本重写）和创建（如对话）。输入、上下文和输出文本之间的信息对齐在描述生成过程中起着共同的中心作用。通过自动对齐预测模型，我们开发了一系列可解释的度量，这些度量适合于评估不同NLG任务的关键方面，通常不需要黄金参考数据。实验表明，在各种任务中，包括文本摘要、风格转换和基于知识的对话，与最先进的指标相比，统一设计的指标与人类判断具有更强或可比的相关性。
摘要：Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives and desires different properties of generated text. The complexity makes automatic evaluation of NLG particularly challenging. Previous work has typically focused on a single task and developed individual evaluation metrics based on specific intuitions. In this paper, we propose a unifying perspective based on the nature of information change in NLG tasks, including compression (e.g., summarization), transduction (e.g., text rewriting), and creation (e.g., dialog). Information alignment between input, context, and output text plays a common central role in characterizing the generation. With automatic alignment prediction models, we develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks, often without need of gold reference data. Experiments show the uniformly designed metrics achieve stronger or comparable correlations with human judgement compared to state-of-the-art metrics in each of diverse tasks, including text summarization, style transfer, and knowledge-grounded dialog.

【8】 Sensor Adversarial Traits: Analyzing Robustness of 3D Object Detection Sensor Fusion Models
标题：传感器对抗特性：三维目标检测传感器融合模型的鲁棒性分析
链接：https://arxiv.org/abs/2109.06363

作者：Won Park,Nan Li,Qi Alfred Chen,Z. Morley Mao
机构：⋆ University of Michigan, † UC Irvine
备注：None
摘要：自动车辆（AVs）的一个关键方面是目标检测阶段，该阶段越来越多地使用传感器融合模型执行：多模态3D目标检测模型，该模型利用2D RGB图像数据和来自激光雷达传感器的3D数据作为输入。在这项工作中，我们进行了第一项研究，以分析高性能、开源传感器融合模型体系结构对对抗性攻击的鲁棒性，并对使用额外传感器自动降低对抗性攻击风险的流行观点提出质疑。我们发现，尽管使用了激光雷达传感器，该模型仍容易受到我们精心设计的基于图像的对抗性攻击，包括失踪、通用补丁和欺骗。在确定了潜在的原因之后，我们探索了一些潜在的防御措施，并为改进传感器融合模型提供了一些建议。
摘要：A critical aspect of autonomous vehicles (AVs) is the object detection stage, which is increasingly being performed with sensor fusion models: multimodal 3D object detection models which utilize both 2D RGB image data and 3D data from a LIDAR sensor as inputs. In this work, we perform the first study to analyze the robustness of a high-performance, open source sensor fusion model architecture towards adversarial attacks and challenge the popular belief that the use of additional sensors automatically mitigate the risk of adversarial attacks. We find that despite the use of a LIDAR sensor, the model is vulnerable to our purposefully crafted image-based adversarial attacks including disappearance, universal patch, and spoofing. After identifying the underlying reason, we explore some potential defenses and provide some recommendations for improved sensor fusion models.

【9】 A Practical Adversarial Attack on Contingency Detection of Smart Energy Systems
标题：智能能源系统事故检测的一种实用对抗性攻击
链接：https://arxiv.org/abs/2109.06358

作者：Moein Sabounchi,Jin Wei-Kocsis
机构：Department of Computer and Information Technology - Purdue University
备注：5 Pages, 6 figures
摘要：由于计算和传感技术的进步，深度学习（DL）已广泛应用于智能能源系统（SESs）。这些基于DL的解决方案证明了它们在提高控制系统的有效性和适应性方面的潜力。然而，近年来，越来越多的证据表明，DL技术可以通过精心设计的干扰进行对抗性攻击。对抗性攻击已经在计算机视觉和自然语言处理中得到了研究。然而，在能源系统中，针对对抗性攻击部署和缓解的工作非常有限。在这方面，为了更好地准备SES对抗潜在的对抗性攻击，我们提出了一种创新的对抗性攻击模型，该模型实际上可以破坏能量系统的动态控制。我们还利用深度强化学习（RL）技术优化了所提出的对抗性攻击模型的部署。在本文中，我们将介绍我们在这方面的第一阶段工作。在仿真部分，我们使用标准IEEE 9总线系统评估了我们提出的对抗攻击模型的性能。
摘要：Due to the advances in computing and sensing, deep learning (DL) has widely been applied in smart energy systems (SESs). These DL-based solutions have proved their potentials in improving the effectiveness and adaptiveness of the control systems. However, in recent years, increasing evidence shows that DL techniques can be manipulated by adversarial attacks with carefully-crafted perturbations. Adversarial attacks have been studied in computer vision and natural language processing. However, there is very limited work focusing on the adversarial attack deployment and mitigation in energy systems. In this regard, to better prepare the SESs against potential adversarial attacks, we propose an innovative adversarial attack model that can practically compromise dynamical controls of energy system. We also optimize the deployment of the proposed adversarial attack model by employing deep reinforcement learning (RL) techniques. In this paper, we present our first-stage work in this direction. In simulation section, we evaluate the performance of our proposed adversarial attack model using standard IEEE 9-bus system.

【10】 TREATED:Towards Universal Defense against Textual Adversarial Attacks
标题：治疗：走向对文本敌意攻击的普遍防御
链接：https://arxiv.org/abs/2109.06176

作者：Bin Zhu,Zhaoquan Gu,Le Wang,Zhihong Tian
机构：Cyberspace Institute of Advanced, Technology, Guangzhou University, Guangzhou, China
摘要：最近的研究表明，深层神经网络容易受到对抗性例子的攻击。许多工作研究对抗性示例生成，而很少有工作关注更关键的对抗性防御。现有的对抗性检测方法通常对对抗性示例和攻击方法进行假设（例如，对抗性示例的词频、攻击方法的扰动级别）。然而，这限制了检测方法的适用性。为此，我们提出了一种通用的对抗性检测方法，可以在不做任何假设的情况下抵御各种干扰级别的攻击。TREATED通过一组精心设计的参考模型识别对抗性示例。在三个竞争性神经网络和两个广泛使用的数据集上进行的大量实验表明，我们的方法比基线具有更好的检测性能。我们最后进行了消融研究，以验证我们方法的有效性。
摘要：Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. TREATED identifies adversarial examples through a set of well-designed reference models. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines. We finally conduct ablation studies to verify the effectiveness of our method.

【11】 ImUnity: a generalizable VAE-GAN solution for multicenter MR image harmonization
标题：ImUnity：一种多中心MR图像协调的通用化VAE-GaN解决方案
链接：https://arxiv.org/abs/2109.06756

作者：Stenzel Cackowski,Emmanuel L. Barbier,Michel Dojat,Thomas Christen
机构：Université Grenoble Alpes
备注：15 pages, 7 Figures
摘要：ImUnity是一种原始的深度学习模型，旨在实现高效灵活的MR图像协调。VAE-GAN网络与混淆模块和可选的生物保存模块相结合，使用从训练数据库每个受试者的不同解剖位置获取的多个2D切片，以及图像对比度变换进行自我监督训练。它最终生成“校正”的MR图像，可用于各种多中心人群研究。使用3个开源数据库（ABIDE、OASIS和SRPBS），其中包含来自多种采集扫描仪类型或供应商以及大量受试者年龄的MR图像，我们表明ImUnity：（1）在使用旅行受试者生成的图像质量方面优于最先进的方法；（2）在改进患者分类的同时消除站点或扫描仪偏差；（3）协调来自新站点或扫描仪的数据，而无需额外微调；（4）允许根据所需应用选择多个MR重建图像。在T1加权图像上测试，ImUnity可用于协调其他类型的医学图像。
摘要：ImUnity is an original deep-learning model designed for efficient and flexible MR image harmonization. A VAE-GAN network, coupled with a confusion module and an optional biological preservation module, uses multiple 2D-slices taken from different anatomical locations in each subject of the training database, as well as image contrast transformations for its self-supervised training. It eventually generates 'corrected' MR images that can be used for various multi-center population studies. Using 3 open source databases (ABIDE, OASIS and SRPBS), which contain MR images from multiple acquisition scanner types or vendors and a large range of subjects ages, we show that ImUnity: (1) outperforms state-of-the-art methods in terms of quality of images generated using traveling subjects; (2) removes sites or scanner biases while improving patients classification; (3) harmonizes data coming from new sites or scanners without the need for an additional fine-tuning and (4) allows the selection of multiple MR reconstructed images according to the desired applications. Tested here on T1-weighted images, ImUnity could be used to harmonize other types of medical images.

【12】 An Apparatus for the Simulation of Breathing Disorders: Physically Meaningful Generation of Surrogate Data
标题：呼吸障碍模拟装置：有物理意义的替代数据生成
链接：https://arxiv.org/abs/2109.06699

作者：Harry J. Davies,Ghena Hammour,Danilo P. Mandic
摘要：虽然慢性阻塞性肺疾病（COPD）等使人衰弱的呼吸障碍的发病率正在迅速增加，但我们看到人工智能继续融入医疗保健。虽然这有助于改善呼吸障碍的检测和监测，但人工智能技术“数据饥渴”，这突出了生成具有物理意义的替代数据的重要性。这样的领域知识感知代理既可以提高对不同呼吸障碍和不同严重程度的呼吸波形变化的理解，也可以增强机器学习算法的训练。为此，我们介绍了一种由PVC管和3D打印部件组成的装置，作为一种简单而有效的方法，用于模拟健康受试者的阻塞性和限制性呼吸波形。通过对吸气阻力和呼气阻力的独立控制，可以通过FEV1/FVC肺活量比值（用于COPD分类）的全谱模拟阻塞性呼吸障碍，范围从健康值到严重慢性阻塞性肺疾病的数值。此外，在使用人工呼吸障碍模拟装置产生的波形中还观察到呼吸障碍的波形特征，例如吸气占空比或峰值流量的变化。总的来说，该仪器为我们提供了一种简单、有效且具有物理意义的方法来生成替代呼吸障碍波形，这是人工智能在呼吸健康中应用的先决条件。
摘要：Whilst debilitating breathing disorders, such as chronic obstructive pulmonary disease (COPD), are rapidly increasing in prevalence, we witness a continued integration of artificial intelligence into healthcare. While this promises improved detection and monitoring of breathing disorders, AI techniques are "data hungry" which highlights the importance of generating physically meaningful surrogate data. Such domain knowledge aware surrogates would enable both an improved understanding of respiratory waveform changes with different breathing disorders and different severities, and enhance the training of machine learning algorithms. To this end, we introduce an apparatus comprising of PVC tubes and 3D printed parts as a simple yet effective method of simulating both obstructive and restrictive respiratory waveforms in healthy subjects. Independent control over both inspiratory and expiratory resistances allows for the simulation of obstructive breathing disorders through the whole spectrum of FEV1/FVC spirometry ratios (used to classify COPD), ranging from healthy values to values seen in severe chronic obstructive pulmonary disease. Moreover, waveform characteristics of breathing disorders, such as a change in inspiratory duty cycle or peak flow are also observed in the waveforms resulting from use of the artificial breathing disorder simulation apparatus. Overall, the proposed apparatus provides us with a simple, effective and physically meaningful way to generate surrogate breathing disorder waveforms, a prerequisite for the use of artificial intelligence in respiratory health.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
标题：语音识别无监督预训练中的性能效率权衡
链接：https://arxiv.org/abs/2109.06870

作者：Felix Wu,Kwangyoun Kim,Jing Pan,Kyu Han,Kilian Q. Weinberger,Yoav Artzi
机构：†ASAPP Inc., ‡Cornell University
备注：Code available at this https URL
摘要：本文研究了自动语音识别（ASR）预训练模型的性能-效率权衡问题。我们专注于wav2vec 2.0，并对影响模型性能和效率的几种体系结构设计进行了形式化描述。综合我们的所有观察结果，我们介绍了SEW（压缩和高效Wav2vec），这是一种经过预训练的模型体系结构，在各种训练设置中，在性能和效率方面都有显著改进。例如，在LibriSpeech上的100h-960h半监督设置下，SEW的推理加速比为wav2vec 2.0的1.9倍，字错误率相对减少13.5%。在推理时间相似的情况下，SEW可将不同模型尺寸的字错误率降低25-50%。
摘要：This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.

【2】 LM-Critic: Language Models for Unsupervised Grammatical Error Correction
标题：LM-Critic：无监督语法纠错的语言模型
链接：https://arxiv.org/abs/2109.06822

作者：Michihiro Yasunaga,Jure Leskovec,Percy Liang
机构：Stanford University
备注：EMNLP 2021. Code & data available at this https URL
摘要：训练语法错误纠正模型（GEC）需要一组标记的非语法/语法句子对，但手动注释这类句子对可能代价高昂。最近，Break-It-Fix-It（BIFI）框架在学习在没有任何标记示例的情况下修复损坏的程序方面显示出了强大的效果，但这依赖于一个完美的批评家（例如编译器），它返回一个示例是否有效，而GEC任务中不存在该示例。在这项工作中，我们展示了如何利用预训练语言模型（LM）来定义LM批评家，如果LM赋予句子比其局部扰动更高的概率，则该模型会判断句子是否符合语法。我们应用LM批评家和BIFI以及大量未标记的句子来引导现实的非语法/语法对，以训练纠正者。我们在多个领域（CoNLL-2014、BEA-2019、GMEG wiki和GMEG yahoo）的GEC数据集上评估了我们的方法，并表明它在无监督设置（+7.7 F0.5）和监督设置（+0.5 F0.5）方面都优于现有方法。
摘要：Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector. We evaluate our approach on GEC datasets across multiple domains (CoNLL-2014, BEA-2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

【3】 Learning to Navigate Intersections with Unsupervised Driver Trait Inference
标题：学习使用无监督驾驶员特征推断导航交叉口
链接：https://arxiv.org/abs/2109.06783

作者：Shuijing Liu,Peixin Chang,Haonan Chen,Neeloy Chakraborty,Katherine Driggs-Campbell
机构：Driggs-Campbell are with the Department of Electrical and Computer En-gineering at the University of Illinois at Urbana-Champaign
摘要：通过非受控交叉口的导航是自动驾驶车辆面临的关键挑战之一。在这样的环境中导航时，识别其他驾驶员隐藏特征的细微差异可以带来显著的好处。我们提出了一种无监督的方法，用于根据观察到的车辆轨迹推断驾驶员特征，如驾驶风格。我们使用一个带有递归神经网络的变分自动编码器来学习没有任何基本真值特征标签的特征的潜在表示。然后，我们使用这种特征表示来学习一种策略，使自动驾驶车辆通过T形交叉口，并进行深度强化学习。我们的管道使自动驾驶汽车能够在处理不同特征的驾驶员时调整其动作，以确保安全和效率。我们的方法表现出良好的性能，在T交叉场景中优于最先进的基线。
摘要：Navigation through uncontrolled intersections is one of the key challenges for autonomous vehicles. Identifying the subtle differences in hidden traits of other drivers can bring significant benefits when navigating in such environments. We propose an unsupervised method for inferring driver traits such as driving styles from observed vehicle trajectories. We use a variational autoencoder with recurrent neural networks to learn a latent representation of traits without any ground truth trait labels. Then, we use this trait representation to learn a policy for an autonomous vehicle to navigate through a T-intersection with deep reinforcement learning. Our pipeline enables the autonomous vehicle to adjust its actions when dealing with drivers of different traits to ensure safety and efficiency. Our method demonstrates promising performance and outperforms state-of-the-art baselines in the T-intersection scenario.

【4】 Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation
标题：神经机器翻译中的非参数无监督域自适应
链接：https://arxiv.org/abs/2109.06604

作者：Xin Zheng,Zhirui Zhang,Shujian Huang,Boxing Chen,Jun Xie,Weihua Luo,Jiajun Chen
机构：National Key Laboratory for Novel Software Technology, Nanjing University, China, Language Technology Lab, Alibaba DAMO Academy, Peng Cheng Laboratory, China
备注：Findings of EMNLP 2021
摘要：最近，$ K$ NN-MT已经显示了有前途的能力，直接结合预先训练的神经机器翻译（NMT）模型与特定领域的令牌级别$ k $最近邻（$ k$ NN）检索，以实现域适应而不重新训练。尽管在概念上很有吸引力，但它严重依赖于高质量的领域内并行语料库，限制了其在无监督领域适应方面的能力，而领域内并行语料库很少或根本不存在。在本文中，我们提出了一个新的框架，该框架直接使用目标语言中的域内单语句来构建一个有效的数据存储库，用于$k$-最近邻检索。为此，我们首先引入一个基于目标语言的自动编码器任务，然后在原始NMT模型中插入轻量级适配器，将该任务的令牌级表示映射到翻译任务的理想表示。在多域数据集上的实验表明，我们提出的方法显著提高了目标端单语数据的翻译精度，同时实现了与反向翻译相当的性能。
摘要：Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, it heavily relies on high-quality in-domain parallel corpora, limiting its capability on unsupervised domain adaptation, where in-domain parallel corpora are scarce or nonexistent. In this paper, we propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval. To this end, we first introduce an autoencoder task based on the target language, and then insert lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task. Experiments on multi-domain datasets demonstrate that our proposed approach significantly improves the translation accuracy with target-side monolingual data, while achieving comparable performance with back-translation.

【5】 Knowledge-guided Self-supervised Learning for estimating River-Basin Characteristics
标题：知识引导的自监督学习在流域特征估计中的应用
链接：https://arxiv.org/abs/2109.06429

作者：Rahul Ghosh,Arvind Renganathan,Ankush Khandelwal,Xiaowei Jia,Xiang Li,John Neiber,Chris Duffy,Vipin Kumar
机构：University of Minnesota, University of Pittsburgh, Christopher Duffy, Penn State University
备注：Submitted to Science-Guided AI, SGAI-AAAI-21
摘要：机器学习被广泛应用于水文学，尤其是流域的径流预测。流域特征对于模拟这些流域的降雨径流响应至关重要，因此数据驱动方法必须考虑这些辅助特征数据。然而，存在一些限制，即测量特征的不确定性、某些盆地的部分缺失特征或已知测量集可能不存在的未知特征。在本文中，我们提出了一个反演模型，该模型使用知识引导的自监督学习算法，利用气象驱动因素和径流响应数据推断流域特征。我们在CAMELS数据集上评估了我们的模型，结果验证了其降低测量不确定度、插补缺失特征和识别未知特征的能力。
摘要：Machine Learning is being extensively used in hydrology, especially streamflow prediction of basins/watersheds. Basin characteristics are essential for modeling the rainfall-runoff response of these watersheds and therefore data-driven methods must take into account this ancillary characteristics data. However there are several limitations, namely uncertainty in the measured characteristics, partially missing characteristics for some of the basins or unknown characteristics that may not be present in the known measured set. In this paper we present an inverse model that uses a knowledge-guided self-supervised learning algorithm to infer basin characteristics using the meteorological drivers and streamflow response data. We evaluate our model on the the CAMELS dataset and the results validate its ability to reduce measurement uncertainty, impute missing characteristics, and identify unknown characteristics.

【6】 Uncertainty-Aware Machine Translation Evaluation
标题：不确定性感知的机器翻译评价
链接：https://arxiv.org/abs/2109.06352

作者：Taisiya Glushkova,Chrysoula Zerva,Ricardo Rei,André F. T. Martins
机构：Instituto de Telecomunicações, Unbabel, INESC-ID, Instituto Superior Técnico & LUMLIS (Lisbon ELLIS Unit)
备注：Accepted to Findings of EMNLP 2021
摘要：最近提出了几种基于神经网络的机器翻译质量评价指标。然而，他们都求助于点估计，这在细分层面上提供了有限的信息。这是更糟的，因为他们被训练在嘈杂，有偏见和稀缺的人类判断，往往导致不可靠的质量预测。在本文中，我们引入了不确定性感知的机器翻译评估，并分析了预测质量的可信度。我们将COMET框架与两种不确定性估计方法相结合，即Monte Carlo dropout和deep ensembles，以获得质量分数和置信区间。我们比较了QT21数据集和WMT20 metrics任务中多语言对的不确定性感知MT评估方法的性能，并添加了MQM注释。我们用不同数量的参考文献进行实验，并进一步讨论不确定性感知质量评估（无参考文献）对标记可能的关键翻译错误的有用性。
摘要：Several neural-based metrics have been recently proposed to evaluate machine translation quality. However, all of them resort to point estimates, which provide limited information at segment level. This is made worse as they are trained on noisy, biased and scarce human judgements, often resulting in unreliable quality predictions. In this paper, we introduce uncertainty-aware MT evaluation and analyze the trustworthiness of the predicted quality. We combine the COMET framework with two uncertainty estimation methods, Monte Carlo dropout and deep ensembles, to obtain quality scores along with confidence intervals. We compare the performance of our uncertainty-aware MT evaluation methods across multiple language pairs from the QT21 dataset and the WMT20 metrics task, augmented with MQM annotations. We experiment with varying numbers of references and further discuss the usefulness of uncertainty-aware quality estimation (without references) to flag possibly critical translation mistakes.

【7】 Mitigating Sampling Bias and Improving Robustness in Active Learning
标题：主动学习中减少采样偏差和提高鲁棒性的研究
链接：https://arxiv.org/abs/2109.06321

作者：Ranganath Krishnan,Alok Sinha,Nilesh Ahuja,Mahesh Subedar,Omesh Tickoo,Ravi Iyer
备注：Human in the Loop Learning workshop at International Conference on Machine Learning (ICML 2021)
摘要：本文提出了一种简单有效的方法来缓解主动学习中的采样偏差，同时实现最先进的精度和模型鲁棒性。我们引入了监督对比主动学习，在监督环境下利用对比损失进行主动学习。我们提出了一种无偏查询策略，使用我们的方法：有监督对比主动学习（SCAL）和深度特征建模（DFM）来选择不同特征表示的信息数据样本。我们通过实证证明，我们提出的方法减少了抽样偏差，在主动学习设置中实现了最先进的精度和模型校准，查询计算比贝叶斯不一致主动学习快26倍，比CoreSet快11倍。所提出的SCAL方法在对数据集移位和分布外的鲁棒性方面有很大的优势。
摘要：This paper presents simple and efficient methods to mitigate sampling bias in active learning while achieving state-of-the-art accuracy and model robustness. We introduce supervised contrastive active learning by leveraging the contrastive loss for active learning under a supervised setting. We propose an unbiased query strategy that selects informative data samples of diverse feature representations with our methods: supervised contrastive active learning (SCAL) and deep feature modeling (DFM). We empirically demonstrate our proposed methods reduce sampling bias, achieve state-of-the-art accuracy and model calibration in an active learning setup with the query computation 26x faster than Bayesian active learning by disagreement and 11x faster than CoreSet. The proposed SCAL method outperforms by a big margin in robustness to dataset shift and out-of-distribution.

【8】 Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty
标题：不确定条件下序贯医疗决策的先发式学习延迟
链接：https://arxiv.org/abs/2109.06312

作者：Shalmali Joshi,Sonali Parbhoo,Finale Doshi-Velez
摘要：我们建议SLTD（“顺序学习延迟”）是一个框架，用于学习在顺序决策环境中先发制人地延迟给专家。SLTD根据动态中潜在的不确定性，衡量现在推迟与以后推迟的价值提高的可能性。特别是，我们关注动态中的非平稳性，以便准确地了解延迟策略。我们证明，我们的先发制人延期可以确定当前政策改善结果的可能性较低的地区。在延迟基线方面，SLTD优于现有的非顺序学习，同时降低了多个合成和真实世界模拟器（具有非平稳动态）的总体不确定性。我们进一步推导和分解传播的（长期）不确定性，以供领域专家解释，以提供模型性能可靠的指示。
摘要：We propose SLTD (`Sequential Learning-to-Defer') a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings. SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics. In particular, we focus on the non-stationarity in the dynamics to accurately learn the deferral policy. We demonstrate our pre-emptive deferral can identify regions where the current policy has a low probability of improving outcomes. SLTD outperforms existing non-sequential learning-to-defer baselines, whilst reducing overall uncertainty on multiple synthetic and real-world simulators with non-stationary dynamics. We further derive and decompose the propagated (long-term) uncertainty for interpretation by the domain expert to provide an indication of when the model's performance is reliable.

迁移|Zero/Few/One-Shot|自适应(4篇)

【1】 One-Class Meta-Learning: Towards Generalizable Few-Shot Open-Set Classification
标题：单类元学习：通用性少射开集分类
链接：https://arxiv.org/abs/2109.06859

作者：Jedrzej Kozerawski,Matthew Turk
机构：Department of Electrical & Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, Toyota Technological Institute at Chicago, South Kenwood Ave, Chicago, IL
备注：21 pages, submitted to BMVC 2021
摘要：现实世界中的分类任务通常需要在开放集环境中工作。由于每个已知类别的样本量很小，这对于少数镜头学习问题来说尤其具有挑战性，这妨碍了现有的开放集方法的有效工作；然而，大多数多类Few-Shot方法仅限于封闭集场景。在这项工作中，我们解决了少数镜头开集分类问题，首先提出了少数镜头单类分类方法，然后将其扩展到少数镜头多类开集分类。我们介绍了两种独立的Few-Shot单类分类方法：元二进制交叉熵（Meta-Binary Cross-Entropy，Meta-BCE）和一类元学习（one-class-Meta-Learning，OCML），前者学习为一类分类学习单独的特征表示，后者学习在给定标准多类特征表示的情况下生成一类分类器。这两种方法都可以扩充任何现有的Few-Shot学习方法，而不需要再训练，以在Few-Shot多类开放集环境中工作，而不会降低其闭集性能。我们在不同的问题设置中展示了这两种方法的优缺点，并在三个标准基准数据集miniImageNet、tieredImageNet和Caltech-UCSD-Birds-200-2011上对它们进行了评估，在Few-Shot多类开放集和Few-Shot单类任务中，它们超过了最先进的方法。
摘要：Real-world classification tasks are frequently required to work in an open-set setting. This is especially challenging for few-shot learning problems due to the small sample size for each known category, which prevents existing open-set methods from working effectively; however, most multiclass few-shot methods are limited to closed-set scenarios. In this work, we address the problem of few-shot open-set classification by first proposing methods for few-shot one-class classification and then extending them to few-shot multiclass open-set classification. We introduce two independent few-shot one-class classification methods: Meta Binary Cross-Entropy (Meta-BCE), which learns a separate feature representation for one-class classification, and One-Class Meta-Learning (OCML), which learns to generate one-class classifiers given standard multiclass feature representation. Both methods can augment any existing few-shot learning method without requiring retraining to work in a few-shot multiclass open-set setting without degrading its closed-set performance. We demonstrate the benefits and drawbacks of both methods in different problem settings and evaluate them on three standard benchmark datasets, miniImageNet, tieredImageNet, and Caltech-UCSD-Birds-200-2011, where they surpass the state-of-the-art methods in the few-shot multiclass open-set and few-shot one-class tasks.

【2】 Few-shot Quality-Diversity Optimisation
标题：少射质量-多样性优化
链接：https://arxiv.org/abs/2109.06826

作者：Achkan Salehi,Alexandre Coninx,Stephane Doncieux
备注：8 pages, 3 figures, 2 tables
摘要：在过去的几年中，大量的研究致力于开发以前的学习经验，并在从计算机视觉到基于强化学习的控制等问题领域设计少量的快照和元学习方法。一个值得注意的例外是，据我们所知，在这方面几乎没有做出任何努力，即质量多样性（QD）优化。QD方法已被证明是处理强化学习中欺骗性极小值和稀疏奖励的有效工具。然而，由于依赖于固有的低效率进化过程，它们的成本仍然很高。我们表明，给定任务分布的示例，可以利用参数空间中优化所采用路径的信息来构建先验总体，当用于在不可见环境中初始化QD方法时，允许很少的镜头自适应。我们提出的方法不需要反向传播。它易于实现和扩展，而且对正在训练的底层模型不可知。使用机器人操作和导航基准在稀疏和密集奖励设置中进行的实验表明，它大大减少了这些环境中QD优化所需的代数。
摘要：In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimisation. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimisation in parameter space can be leveraged to build a prior population, which when used to initialise QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimisation in these environments.

【3】 Complexity-aware Adaptive Training and Inference for Edge-Cloud Distributed AI Systems
标题：边缘云分布式人工智能系统的复杂性感知自适应训练与推理
链接：https://arxiv.org/abs/2109.06440

作者：Yinghan Long,Indranil Chakraborty,Gopalakrishnan Srinivasan,Kaushik Roy
机构：School of Electrical and Computer Engineering, Purdue University
备注：41st IEEE International Conference on Distributed Computing Systems, 2021
摘要：物联网和机器学习应用程序的普遍使用正在产生大量需要准确实时处理的数据。虽然基于边缘的智能数据处理可以通过部署预训练模型来实现，但边缘设备的能量和内存限制需要在边缘和云之间对复杂数据进行分布式深度学习。在本文中，我们提出了一个分布式人工智能系统，利用边缘和云进行训练和推理。我们提出了一种新的结构MEANet，它具有一个主块、一个扩展块和一个自适应的边缘块。推理过程可以在主块、扩展块或云上终止。MEANet经过训练，可将输入分类为简单/困难/复杂类。主块识别易/难类的实例，并以高置信度对易类进行分类。只有属于硬类的高概率数据才会被发送到扩展块进行预测。此外，仅当边缘处的神经网络在预测中显示出较低的置信度时，实例才会被认为是复杂的，并被发送到云中进行进一步处理。这种训练技术可以在边缘设备上进行大部分推理，而云端仅用于一小部分由边缘决定的复杂作业。通过在CIFAR-100和ImageNet数据集上使用改进的RESNET和MobileNetV2模型进行的大量实验，评估了拟议系统的性能。结果表明，所提出的分布式模型提高了精度和能耗，表明了其适应能力。
摘要：The ubiquitous use of IoT and machine learning applications is creating large amounts of data that require accurate and real-time processing. Although edge-based smart data processing can be enabled by deploying pretrained models, the energy and memory constraints of edge devices necessitate distributed deep learning between the edge and the cloud for complex data. In this paper, we propose a distributed AI system to exploit both the edge and the cloud for training and inference. We propose a new architecture, MEANet, with a main block, an extension block, and an adaptive block for the edge. The inference process can terminate at either the main block, the extension block, or the cloud. The MEANet is trained to categorize inputs into easy/hard/complex classes. The main block identifies instances of easy/hard classes and classifies easy classes with high confidence. Only data with high probabilities of belonging to hard classes would be sent to the extension block for prediction. Further, only if the neural network at the edge shows low confidence in the prediction, the instance is considered complex and sent to the cloud for further processing. The training technique lends to the majority of inference on edge devices while going to the cloud only for a small set of complex jobs, as determined by the edge. The performance of the proposed system is evaluated via extensive experiments using modified models of ResNets and MobileNetV2 on CIFAR-100 and ImageNet datasets. The results show that the proposed distributed model has improved accuracy and energy consumption, indicating its capacity to adapt.

【4】 Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning
标题：基于对比预训练和微调的少射击意图检测
链接：https://arxiv.org/abs/2109.06349

作者：Jianguo Zhang,Trung Bui,Seunghyun Yoon,Xiang Chen,Zhiwei Liu,Congying Xia,Quan Hung Tran,Walter Chang,Philip Yu
机构： University of Illinois at Chicago, Chicago, USA, Adobe Research, San Jose, USA
备注：Accepted by EMNLP 2021 main conference
摘要：在这项工作中，我们关注一个更具挑战性的少数镜头意图检测场景，其中许多意图是细粒度的，并且语义相似。通过对比预训练和微调，我们提出了一种简单而有效的Few-Shot意图检测方案。具体来说，我们首先对收集到的意图数据集进行自我监督的对比预训练，该训练隐式地学习区分语义相似的话语，而不使用任何标签。然后，我们结合监督对比学习进行少量镜头意图检测，明确地将来自同一意图的话语拉近，并将不同意图的话语推远。实验结果表明，我们提出的方法在三个具有挑战性的意图检测数据集上，在5镜头和10镜头设置下达到了最先进的性能。
摘要：In this work, we focus on a more challenging few-shot intent detection scenario where many intents are fine-grained and semantically similar. We present a simple yet effective few-shot intent detection schema via contrastive pre-training and fine-tuning. Specifically, we first conduct self-supervised contrastive pre-training on collected intent datasets, which implicitly learns to discriminate semantically similar utterances without using any labels. We then perform few-shot intent detection together with supervised contrastive learning, which explicitly pulls utterances from the same intent closer and pushes utterances across different intents farther. Experimental results show that our proposed method achieves state-of-the-art performance on three challenging intent detection datasets under 5-shot and 10-shot settings.

强化学习(7篇)

【1】 ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation
标题：ROMAX：基于凸松弛的可证明鲁棒的深度多智能体强化学习
链接：https://arxiv.org/abs/2109.06795

作者：Chuangchuang Sun,Dong-Ki Kim,Jonathan P. How
机构： MassachusettsInstitute of Technology
摘要：在多机器人系统中，大量的网络物理攻击（例如，通信劫持、观测干扰）会对代理的鲁棒性造成挑战。这种鲁棒性问题在多智能体强化学习中变得更加严重，因为同时学习智能体的策略改变会影响迁移和奖励函数，从而导致环境的非平稳性。在本文中，我们提出了一种minimax-MARL方法来推断其他代理的最坏情况策略更新。由于minimax公式在计算上很难求解，我们应用神经网络的凸松弛来解决内部极小化问题。这种凸松弛使得在与可能具有显著不同行为的对等代理交互时具有鲁棒性，并且还实现了原始优化问题的认证界。我们在多个混合合作竞争任务上评估了我们的方法，并表明我们的方法在这个主题上优于以前最先进的方法。
摘要：In a multirobot system, a number of cyber-physical attacks (e.g., communication hijack, observation perturbations) can challenge the robustness of agents. This robustness issue worsens in multiagent reinforcement learning because there exists the non-stationarity of the environment caused by simultaneously learning agents whose changing policies affect the transition and reward functions. In this paper, we propose a minimax MARL approach to infer the worst-case policy update of other agents. As the minimax formulation is computationally intractable to solve, we apply the convex relaxation of neural networks to solve the inner minimization problem. Such convex relaxation enables robustness in interacting with peer agents that may have significantly different behaviors and also achieves a certified bound of the original optimization problem. We evaluate our approach on multiple mixed cooperative-competitive tasks and show that our method outperforms the previous state of the art approaches on this topic.

【2】 Exploration in Deep Reinforcement Learning: A Comprehensive Survey
标题：深度强化学习的探索：综述
链接：https://arxiv.org/abs/2109.06668

作者：Tianpei Yang,Hongyao Tang,Chenjia Bai,Jinyi Liu,Jianye Hao,Zhaopeng Meng,Peng Liu
机构： Tianjin University, Chenjia Bai and Peng Liu are with the School of ComputerScience and Technology, Harbin Institute of Technology
备注：arXiv admin note: text overlap with arXiv:1908.06976 by other authors
摘要：深度强化学习（DRL）和深度多智能体强化学习（MARL）在游戏人工智能、自主车辆、机器人和金融等广泛领域取得了巨大成功。然而，众所周知，DRL和deep MARL代理效率低下，即使在相对简单的游戏设置中，通常也需要数百万次交互，因此无法在实际行业场景中广泛应用。背后的一个瓶颈挑战是众所周知的探索问题，即如何有效探索未知环境并收集最有利于政策学习的信息性经验。在本文中，我们对DRL和深层泥灰岩中的现有勘探方法进行了全面调查，目的是对关键问题和解决方案提供理解和见解。我们首先确定了实现高效勘探的几个关键挑战，这是大多数勘探方法的目标。然后，我们将现有的研究方法分为两大类：以不确定性为导向的研究和以内在动机为导向的研究。以不确定性为导向的探索的本质是利用认知和任意不确定性的量化来获得有效的探索。相比之下，内在动机导向的探索方法通常包含不同的奖励不可知信息，用于内在探索指导。除上述两个主要分支外，我们还总结了采用复杂技术但难以归类为上述两类的其他勘探方法。此外，我们在一组常用基准上对DRL勘探方法进行了全面的实证比较。最后，我们总结了DRL和深层泥灰岩勘探中存在的问题，并指出了未来的发展方向。
摘要：Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance. However, DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings, thus preventing the wide application in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how to efficiently explore the unknown environments and collect informative experiences that could benefit the policy learning most. In this paper, we conduct a comprehensive survey on existing exploration methods in DRL and deep MARL for the purpose of providing understandings and insights on the critical problems and solutions. We first identify several key challenges to achieve efficient exploration, which most of the exploration methods aim at addressing. Then we provide a systematic survey of existing approaches by classifying them into two major categories: uncertainty-oriented exploration and intrinsic motivation-oriented exploration. The essence of uncertainty-oriented exploration is to leverage the quantification of the epistemic and aleatoric uncertainty to derive efficient exploration. By contrast, intrinsic motivation-oriented exploration methods usually incorporate different reward agnostic information for intrinsic exploration guidance. Beyond the above two main branches, we also conclude other exploration methods which adopt sophisticated techniques but are difficult to be classified into the above two categories. In addition, we provide a comprehensive empirical comparison of exploration methods for DRL on a set of commonly used benchmarks. Finally, we summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.

【3】 Towards optimized actions in critical situations of soccer games with deep reinforcement learning
标题：基于深度强化学习的足球比赛危急情况下动作优化
链接：https://arxiv.org/abs/2109.06625

作者：Pegah Rahimian,Afshin Oroojlooy,Laszlo Toka
机构：Budapest University of Technology, and Economics, Budapest, Hungary, SAS Institute, Cary, NC, USA, MTA-BME Information Systems, Research Group
摘要：足球是一项回报很少的运动：在关键情况下，任何聪明或粗心的行为都可能改变比赛结果。因此，球员、教练和球探都很好奇在关键情况下应该采取什么样的最佳行动，比如在很有可能失去控球权或进球的情况下。本文提出了一种新的足球比赛状态表示方法和一种训练智能策略网络的批量强化学习方法。该网络获取情境的上下文信息，并提出最佳行动，以最大限度地实现团队的预期目标。我们对InStat为104场欧洲足球比赛制作的足球日志进行了广泛的数值实验。结果表明，在所有104个博弈中，优化策略比行为策略中的对应策略获得了更高的回报。此外，我们的框架学习接近现实世界中预期行为的策略。例如，在优化策略中，我们观察到，在特定情况下，某些动作（如犯规或出局）有时比投篮更有回报。
摘要：Soccer is a sparse rewarding game: any smart or careless action in critical situations can change the result of the match. Therefore players, coaches, and scouts are all curious about the best action to be performed in critical situations, such as the times with a high probability of losing ball possession or scoring a goal. This work proposes a new state representation for the soccer game and a batch reinforcement learning to train a smart policy network. This network gets the contextual information of the situation and proposes the optimal action to maximize the expected goal for the team. We performed extensive numerical experiments on the soccer logs made by InStat for 104 European soccer matches. The results show that in all 104 games, the optimized policy obtains higher rewards than its counterpart in the behavior policy. Besides, our framework learns policies that are close to the expected behavior in the real world. For instance, in the optimized policy, we observe that some actions such as foul, or ball out can be sometimes more rewarding than a shot in specific situations.

【4】 DSDF: An approach to handle stochastic agents in collaborative multi-agent reinforcement learning
标题：DSDF：协作多智能体强化学习中处理随机智能体的方法
链接：https://arxiv.org/abs/2109.06609

作者：Satheesh K. Perepu,Kaushik Dey
机构：Ericsson Research (Artificial Intelligence), Chennai, Tamil Nadi, India
摘要：近年来，多智能体强化学习受到了广泛关注，并在许多不同领域得到了应用。现有的方法包括集中训练和分散执行，试图训练代理学习协调行动的模式，以达到最佳的联合策略。然而，如果某些代理具有不同程度的随机性，上述方法往往无法收敛，并且代理之间的协调性较差。在本文中，我们展示了智能体的随机性，这可能是机器人故障或老化的结果，如何增加协调中的不确定性，从而导致不满意的全局协调。在这种情况下，确定性代理必须了解随机代理的行为和局限性，同时获得最优的联合策略。我们的解决方案是DSDF，它根据不确定性调整代理的贴现因子，并使用这些值来更新单个代理的效用网络。DSDF还有助于在协调中赋予一定程度的可靠性，从而授予随机代理即时的、轨迹较短的任务，而确定性代理则执行涉及较长规划的任务。这种方法可以实现代理的联合协调，其中一些代理可能部分执行，因此在许多情况下可以减少或延迟代理/机器人更换的投资。不同场景下的基准测试结果表明，与现有方法相比，本文提出的方法是有效的。
摘要：Multi-Agent reinforcement learning has received lot of attention in recent years and have applications in many different areas. Existing methods involving Centralized Training and Decentralized execution, attempts to train the agents towards learning a pattern of coordinated actions to arrive at optimal joint policy. However if some agents are stochastic to varying degrees of stochasticity, the above methods often fail to converge and provides poor coordination among agents. In this paper we show how this stochasticity of agents, which could be a result of malfunction or aging of robots, can add to the uncertainty in coordination and there contribute to unsatisfactory global coordination. In this case, the deterministic agents have to understand the behavior and limitations of the stochastic agents while arriving at optimal joint policy. Our solution, DSDF which tunes the discounted factor for the agents according to uncertainty and use the values to update the utility networks of individual agents. DSDF also helps in imparting an extent of reliability in coordination thereby granting stochastic agents tasks which are immediate and of shorter trajectory with deterministic ones taking the tasks which involve longer planning. Such an method enables joint co-ordinations of agents some of which may be partially performing and thereby can reduce or delay the investment of agent/robot replacement in many circumstances. Results on benchmark environment for different scenarios shows the efficacy of the proposed approach when compared with existing approaches.

【5】 Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
标题：情景强化学习虚拟折扣算法的理论保证和策略梯度方法的全局收敛性
链接：https://arxiv.org/abs/2109.06362

作者：Xin Guo,Anran Hu,Junzi Zhang
备注：42 pages
摘要：在设计有限时间范围内幕式强化学习问题的算法时，一种常见的方法是引入一个虚构的贴现因子，并使用平稳策略进行近似。经验表明，虚拟贴现因子有助于减少方差，而固定策略有助于节省每次迭代的计算成本。然而，从理论上讲，目前还没有关于使用这种虚构折扣配方的算法的收敛性分析的工作。本文是分析这些算法的第一步。它着重于两个普通的政策梯度（VPG）变量：第一个是广泛使用的具有折扣优势估计（DAE）的变量，第二个是在政策梯度估计的得分函数中具有额外的虚拟折扣因子。两种算法都建立了非渐近收敛性保证，并证明了附加的折扣因子可以减少DAE中引入的偏差，从而提高算法的渐近收敛性。我们分析的一个关键要素是连接马尔可夫决策过程（MDP）的三个设置：有限时间范围、平均报酬和折扣设置。据我们所知，这是有限时间范围内MDP幕式强化学习的虚拟折扣算法的第一个理论保证，这也导致有限时间范围内幕式强化学习的策略梯度方法（第一个）全局收敛。
摘要：When designing algorithms for finite-time-horizon episodic reinforcement learning problems, a common approach is to introduce a fictitious discount factor and use stationary policies for approximations. Empirically, it has been shown that the fictitious discount factor helps reduce variance, and stationary policies serve to save the per-iteration computational cost. Theoretically, however, there is no existing work on convergence analysis for algorithms with this fictitious discount recipe. This paper takes the first step towards analyzing these algorithms. It focuses on two vanilla policy gradient (VPG) variants: the first being a widely used variant with discounted advantage estimations (DAE), the second with an additional fictitious discount factor in the score functions of the policy gradient estimators. Non-asymptotic convergence guarantees are established for both algorithms, and the additional discount factor is shown to reduce the bias introduced in DAE and thus improve the algorithm convergence asymptotically. A key ingredient of our analysis is to connect three settings of Markov decision processes (MDPs): the finite-time-horizon, the average reward and the discounted settings. To our best knowledge, this is the first theoretical guarantee on fictitious discount algorithms for the episodic reinforcement learning of finite-time-horizon MDPs, which also leads to the (first) global convergence of policy gradient methods for finite-time-horizon episodic reinforcement learning.

【6】 Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
标题：用原始-对偶方法实现约束强化学习的零约束违反
链接：https://arxiv.org/abs/2109.06332

作者：Qinbo Bai,Amrit Singh Bedi,Mridul Agarwal,Alec Koppel,Vaneet Aggarwal
机构：Department of Electrical and Computer Engineering, Purude University, West Lafayette, IN , USA, US Army Research Lab, Department of Industrial Engineering, Editor:
摘要：强化学习广泛应用于需要在与环境交互时执行顺序决策的应用中。当决策要求包括满足某些安全约束时，问题变得更具挑战性。该问题在数学上表示为约束马尔可夫决策过程（CMDP）。在文献中，有各种算法可用于以无模型的方式解决CMDP问题，以实现$\epsilon$最优累积回报和$\epsilon$可行策略。$\epsilon$-可行策略意味着它受到约束冲突的影响。这里的一个重要问题是，我们是否能够在零违反约束的情况下实现$\epsilon$-最优累积奖励。为了实现这一点，我们提倡使用随机原始-对偶方法来解决CMDP问题，并提出了一种保守的随机原始-对偶算法（CSPDA），该算法显示出$\tilde{\mathcal{O}（1/\epsilon^2）$样本复杂度，以实现$\epsilon$最优累积奖励，且不违反任何约束。在以前的工作中，具有零约束冲突的$\epsilon$-最优策略的最佳可用样本复杂度是$\tilde{\mathcal{O}}（1/\epsilon^5）$。因此，与现有技术相比，所提出的算法提供了显著的改进。
摘要：Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. The problem is mathematically formulated as constrained Markov decision process (CMDP). In the literature, various algorithms are available to solve CMDP problems in a model-free manner to achieve $\epsilon$-optimal cumulative reward with $\epsilon$ feasible policies. An $\epsilon$-feasible policy implies that it suffers from constraint violation. An important question here is whether we can achieve $\epsilon$-optimal cumulative reward with zero constraint violations or not. To achieve that, we advocate the use of a randomized primal-dual approach to solving the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations. In the prior works, the best available sample complexity for the $\epsilon$-optimal policy with zero constraint violation is $\tilde{\mathcal{O}}(1/\epsilon^5)$. Hence, the proposed algorithm provides a significant improvement as compared to the state of the art.

【7】 safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning
标题：安全控制健身房：基于安全学习的控制和强化学习的统一基准套件
链接：https://arxiv.org/abs/2109.06325

作者：Zhaocong Yuan,Adam W. Hall,Siqi Zhou,Lukas Brunke,Melissa Greeff,Jacopo Panerati,Angela P. Schoellig
机构：UniversityofToronto;theUniversityofTorontoRoboticsInstitute;andaffiliatedwiththeVectorIn-stitute for Artificial Intelligence in Toronto
备注：8 pages, 8 figures
摘要：近年来，强化学习和基于学习的控制——以及对其安全性的研究——对于在现实世界机器人中的部署至关重要——已经获得了显著的发展。然而，为了充分衡量新结果的进展和适用性，我们需要工具公平地比较控制和强化学习社区提出的方法。在这里，我们提出了一个新的开源基准测试套件，称为safe control gym。我们的出发点是OpenAI的Gym API，这是强化学习研究中事实上的标准之一。然而，我们强调了其对控制理论研究者吸引力有限的原因，尤其是安全控制。例如，缺乏分析模型和约束规范。因此，我们建议扩展此API（i）指定（和查询）符号模型和约束的能力，以及（ii）在控制输入、测量和惯性特性中引入模拟干扰。我们提供了三个动态系统的实现——小车杆、1D和2D四旋翼——以及两个控制任务——稳定和轨迹跟踪。为了证明我们的建议，并试图使研究团体更紧密地联系在一起，我们展示了如何使用安全控制健身房来定量比较传统控制、基于学习的控制和强化学习领域中多种方法的控制性能、数据效率和安全性。
摘要：In recent years, reinforcement learning and learning-based control -- as well as the study of their safety, crucial for deployment in real-world robots -- have gained significant traction. However, to adequately gauge the progress and applicability of new results, we need the tools to equitably compare the approaches proposed by the controls and reinforcement learning communities. Here, we propose a new open-source benchmark suite, called safe-control-gym. Our starting point is OpenAI's Gym API, which is one of the de facto standard in reinforcement learning research. Yet, we highlight the reasons for its limited appeal to control theory researchers -- and safe control, in particular. E.g., the lack of analytical models and constraint specifications. Thus, we propose to extend this API with (i) the ability to specify (and query) symbolic models and constraints and (ii) introduce simulated disturbances in the control inputs, measurements, and inertial properties. We provide implementations for three dynamic systems -- the cart-pole, 1D, and 2D quadrotor -- and two control tasks -- stabilization and trajectory tracking. To demonstrate our proposal -- and in an attempt to bring research communities closer together -- we show how to use safe-control-gym to quantitatively compare the control performance, data efficiency, and safety of multiple approaches from the areas of traditional control, learning-based control, and reinforcement learning.

符号|符号学习(1篇)

【1】 An Insect-Inspired Randomly, Weighted Neural Network with Random Fourier Features For Neuro-Symbolic Relational Learning
标题：用于神经符号关系学习的具有随机傅立叶特征的昆虫启发随机加权神经网络
链接：https://arxiv.org/abs/2109.06663

作者：Jinyung Hong,Theodore P. Pavlic
机构：School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ , USA, School of Sustainability, Arizona State University, Tempe, AZ , USA, School of Complex Adaptive Systems, Arizona State University, Tempe, AZ , USA
备注：17 pages, 5 figures, 2 tables, submitted to NeSy20/21 @ IJCLR. arXiv admin note: text overlap with arXiv:2006.12392
摘要：昆虫，如果蝇和蜜蜂，可以解决简单的联想学习任务，学习抽象概念，如“相同”和“差异”，这被视为一种高阶认知功能，通常被认为依赖于自上而下的新皮质处理。对果蝇的实证研究强烈支持昆虫大脑中嗅觉处理采用随机表征结构。基于这些结果，我们提出了一种随机加权特征网络（RWFN），该网络将随机抽取的未经训练的权重合并到使用自适应线性模型作为解码器的编码器中。在RWFN中，输入神经元和输入大脑中高阶处理中心之间的随机投影由单个隐层神经网络模拟，该隐层神经网络使用随机傅立叶特征在隐层中专门构造潜在表示，该特征使用核函数更好地表示输入之间的复杂关系近似值。由于这种特殊的表示，RWFNs可以通过只训练线性解码器模型来有效地学习输入之间的关系程度。我们比较了RWFNs和LTN在语义图像解释（SII）任务中的性能，这些任务被用作LTN如何利用一阶逻辑推理来超越单纯数据驱动方法性能的代表性示例。我们证明，与LTN相比，RWFN在SII任务中的对象分类和对象之间部分关系的检测方面可以实现更好或类似的性能，同时使用更少的可学习参数（1:62比率）和更快的学习过程（1:2运行速度比率）。此外，我们还表明，由于随机权重不依赖于数据，因此多个解码器可以共享一个随机编码器，从而使RWFN在同时分类任务中具有独特的空间尺度经济性。
摘要：Insects, such as fruit flies and honey bees, can solve simple associative learning tasks and learn abstract concepts such as "sameness" and "difference", which is viewed as a higher-order cognitive function and typically thought to depend on top-down neocortical processing. Empirical research with fruit flies strongly supports that a randomized representational architecture is used in olfactory processing in insect brains. Based on these results, we propose a Randomly Weighted Feature Network (RWFN) that incorporates randomly drawn, untrained weights in an encoder that uses an adapted linear model as a decoder. The randomized projections between input neurons and higher-order processing centers in the input brain is mimicked in RWFN by a single-hidden-layer neural network that specially structures latent representations in the hidden layer using random Fourier features that better represent complex relationships between inputs using kernel approximation. Because of this special representation, RWFNs can effectively learn the degree of relationship among inputs by training only a linear decoder model. We compare the performance of RWFNs to LTNs for Semantic Image Interpretation (SII) tasks that have been used as a representative example of how LTNs utilize reasoning over first-order logic to surpass the performance of solely data-driven methods. We demonstrate that compared to LTNs, RWFNs can achieve better or similar performance for both object classification and detection of the part-of relations between objects in SII tasks while using much far fewer learnable parameters (1:62 ratio) and a faster learning process (1:2 ratio of running speed). Furthermore, we show that because the randomized weights do not depend on the data, several decoders can share a single randomized encoder, giving RWFNs a unique economy of spatial scale for simultaneous classification tasks.

医学相关(4篇)

【1】 COVID-Net Clinical ICU: Enhanced Prediction of ICU Admission for COVID-19 Patients via Explainability and Trust Quantification
标题：COVID-NET临床ICU：通过可解释性和可信性量化加强对冠状病毒患者入院ICU的预测
链接：https://arxiv.org/abs/2109.06711

作者：Audrey Chung,Mahmoud Famouri,Andrew Hryniowski,Alexander Wong
机构： Universityof Waterloo, Canada 3Waterloo Artificial Intel-ligence Institute
备注：5 pages
摘要：新冠病毒-19大流行继续对全球造成毁灭性影响，给世界各地苦苦挣扎的医疗系统带来了巨大负担。鉴于有限的资源，准确的患者分类和护理规划对于抗击新冠病毒-19至关重要，而护理规划中的一项关键任务是确定患者是否应入住医院的重症监护病房（ICU）。出于对透明和可信的ICU入院临床决策支持的需求，我们介绍了一种基于患者临床数据的神经网络，用于ICU入院预测。在透明、以信任为中心的方法的推动下，提议的新冠病毒网络临床ICU是使用Sirio Libanes医院的临床数据集构建的，该数据集包括1925名新冠病毒-19患者，并且能够以96.9%的准确率预测新冠病毒-19阳性患者何时需要入住ICU，以便在当前大流行中更好地为医院制定护理计划。我们使用定量解释策略进行系统级洞察发现，以研究不同临床特征的决策影响，并获得可操作的洞察，以提高预测性能。我们进一步利用一套信任量化指标来深入了解新冠病毒净临床ICU的可信度。通过深入挖掘临床预测模型做出某些决策的时间和原因，我们可以发现关键临床决策支持任务（如ICU入院预测）决策中的关键因素，并确定临床预测模型在何种情况下可以更可靠地承担责任。
摘要：The COVID-19 pandemic continues to have a devastating global impact, and has placed a tremendous burden on struggling healthcare systems around the world. Given the limited resources, accurate patient triaging and care planning is critical in the fight against COVID-19, and one crucial task within care planning is determining if a patient should be admitted to a hospital's intensive care unit (ICU). Motivated by the need for transparent and trustworthy ICU admission clinical decision support, we introduce COVID-Net Clinical ICU, a neural network for ICU admission prediction based on patient clinical data. Driven by a transparent, trust-centric methodology, the proposed COVID-Net Clinical ICU was built using a clinical dataset from Hospital Sirio-Libanes comprising of 1,925 COVID-19 patients, and is able to predict when a COVID-19 positive patient would require ICU admission with an accuracy of 96.9% to facilitate better care planning for hospitals amidst the on-going pandemic. We conducted system-level insight discovery using a quantitative explainability strategy to study the decision-making impact of different clinical features and gain actionable insights for enhancing predictive performance. We further leveraged a suite of trust quantification metrics to gain deeper insights into the trustworthiness of COVID-Net Clinical ICU. By digging deeper into when and why clinical predictive models makes certain decisions, we can uncover key factors in decision making for critical clinical decision support tasks such as ICU admission prediction and identify the situations under which clinical predictive models can be trusted for greater accountability.

【2】 A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients
标题：从EHR数据评估平均治疗效果的实用方法：俯卧位对机械通气冠状病毒患者的影响
链接：https://arxiv.org/abs/2109.06707

作者：Adam Izdebski,Patrick J Thoral,Robbert C A Lalisang,Dean M McHugh,Robert Entjes,Nardo J M van der Meer,Dave A Dongelmans,Age D Boelens,Sander Rigter,Stefaan H A Hendriks,Remko de Jong,Marlijn J A Kamps,Marco Peters,A Karakus,Diederik Gommers,Dharmanand Ramnarain,Evert-Jan Wils,Sefanja Achterberg,Ralph Nowitzky,Walter van den Tempel,Cornelis P C de Jager,Fleur G C A Nooteboom,Evelien Oostdijk,Peter Koetsier,Alexander D Cornet,Auke C Reidinga,Wouter de Ruijter,Rob J Bosman,Tim Frenzel,Louise C Urlings-Strop,Paul de Jong,Ellen G M Smit,Olaf L Cremer,Frits H M van Osch,Harald J Faber,Judith Lens,Gert B Brunnekreef,Barbara Festen-Spanjer,Tom Dormans,Bram Simons,A A Rijkeboer,Annemieke Dijkstra,Sesmu Arbous,Marcel Aries,Menno Beukema,Rutger van Raalte,Martijn van Tellingen,Niels C Gritters van den Oever,Paul W G Elbers,Giovanni Cinà
机构：Cina, PhD, and on behalf of Dutch ICU Data Sharing Against COVID-, Collaborators, Pacmed, Amsterdam, NL, Department of Intensive Care Medicine, Laboratory for Critical Care, Computational Intelligence, Amsterdam Medical Data Science, Amsterdam, UMC, Amsterdam, NL
摘要：尽管因果推断领域最近取得了进展，但迄今为止，还没有商定的方法从观察数据中收集治疗效果评估。临床实践的结果是，当缺乏随机试验的结果时，医务人员就没有关于在现实情况下什么似乎有效的指导。本文展示了从观察性研究中获得治疗效果初步估计的实用方法。我们的方法是在一组新冠病毒-19重症监护患者中，对旋前动作对氧合水平的治疗效果进行评估。我们根据最近的一项关于内旋的随机对照试验（PROSEVA试验）模拟了我们的研究设计。采用线性回归、倾向评分模型（如blocking和DR-IPW）、BART和两种版本的反事实回归）对观察数据进行估计，包括来自25家荷兰医院的第一波新冠病毒-19 ICU患者数据。研究包括745名机械通气患者的6371个数据点。根据模型，旋前的早期效应（旋前2到8小时的P/F比）的估计值在14.54到20.11毫米汞柱之间。旋入术后12至24小时内的氧合作用估计值介于13.53至15.26毫米汞柱之间。所有置信区间均严格高于零，表明旋前对新冠病毒-19患者氧合的影响是积极的，且与对非新冠病毒-19患者的影响大小相当。这些结果为旋内固定治疗新冠病毒-19患者的有效性提供了进一步的证据。这项研究以及附带的开源代码为缺乏RCT数据的情况下的治疗效果评估提供了蓝图。资金来源：SIDN基金、COVIDREDICT财团、Pacmed。
摘要：Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article showcases a pragmatic methodology to obtain preliminary estimation of treatment effect from observational studies. Our approach was tested on the estimation of treatment effect of the proning maneuver on oxygenation levels, on a cohort of COVID-19 Intensive Care patients. We modeled our study design on a recent RCT for proning (the PROSEVA trial). Linear regression, propensity score models such as blocking and DR-IPW, BART and two versions of Counterfactual Regression were employed to provide estimates on observational data comprising first wave COVID-19 ICU patient data from 25 Dutch hospitals. 6371 data points, from 745 mechanically ventilated patients, were included in the study. Estimates for the early effect of proning -- P/F ratio from 2 to 8 hours after proning -- ranged between 14.54 and 20.11 mm Hg depending on the model. Estimates for the late effect of proning -- oxygenation from 12 to 24 hours after proning -- ranged between 13.53 and 15.26 mm Hg. All confidence interval being strictly above zero indicated that the effect of proning on oxygenation for COVID-19 patient was positive and comparable in magnitude to the effect on non COVID-19 patients. These results provide further evidence on the effectiveness of proning on the treatment of COVID-19 patients. This study, along with the accompanying open-source code, provides a blueprint for treatment effect estimation in scenarios where RCT data is lacking. Funding: SIDN fund, CovidPredict consortium, Pacmed.

【3】 Identifying partial mouse brain microscopy images from Allen reference atlas using a contrastively learned semantic space
标题：使用对比学习的语义空间从Allen参考图谱识别部分小鼠脑显微镜图像
链接：https://arxiv.org/abs/2109.06662

作者：Justinas Antanavicius,Roberto Leiras Gonzalez,Raghavendra Selvan
机构：† Department of Computer Science, University of Copenhagen, Denmark, ⋆ Department of Neuroscience, University of Copenhagen, Denmark
备注：Source code available at this https URL 7 pages, 5 figures
摘要：在将小鼠大脑的解剖结构登记到参考图谱时，精确识别小鼠大脑显微镜图像是至关重要的第一步。从业者通常依靠手动比较图像或假设存在完整图像的工具。这项工作探索暹罗网络的方法，找到相应的二维参考地图集板为给定的部分二维小鼠大脑图像。暹罗网络是一类卷积神经网络（CNN），它使用权值共享路径获得输入图像对的低维嵌入。通过对比学习，根据从暹罗网络获得的脑切片和图谱板的低维嵌入之间的距离，确定部分小鼠脑图像与参考图谱板之间的对应关系。实验表明，当训练和测试图像来自同一来源时，暹罗CNN可以使用艾伦小鼠脑图谱精确识别脑切片。他们分别获得了25%和100%的前1名和前5名准确率，仅需7.2秒即可识别29幅图像。
摘要：Precise identification of mouse brain microscopy images is a crucial first step when anatomical structures in the mouse brain are to be registered to a reference atlas. Practitioners usually rely on manual comparison of images or tools that assume the presence of complete images. This work explores Siamese Networks as the method for finding corresponding 2D reference atlas plates for given partial 2D mouse brain images. Siamese networks are a class of convolutional neural networks (CNNs) that use weight-shared paths to obtain low dimensional embeddings of pairs of input images. The correspondence between the partial mouse brain image and reference atlas plate is determined based on the distance between low dimensional embeddings of brain slices and atlas plates that are obtained from Siamese networks using contrastive learning. Experiments showed that Siamese CNNs can precisely identify brain slices using the Allen mouse brain atlas when training and testing images come from the same source. They achieved TOP-1 and TOP-5 accuracy of 25% and 100%, respectively, taking only 7.2 seconds to identify 29 images.

【4】 Deep Denerative Models for Drug Design and Response
标题：药物设计与反应的深度退化模型
链接：https://arxiv.org/abs/2109.06469

作者：Karina Zadorozhny,Lada Nuzhna
机构：Northwestern University
摘要：设计具有理想药物性质的新化合物是一项具有挑战性的任务，需要多年的开发和测试。尽管如此，大多数新药仍未能证明有效。深度生成模型的最新成功为新分子的生成和优化带来了希望。在这篇综述文章中，我们概述了当前的生成模型，并描述了必要的生物学和化学术语，包括理解药物设计和药物反应领域所需的分子表征。我们介绍了常用的化学和生物数据库，以及用于生成建模的工具。最后，我们总结了药物设计和药物反应预测的生成模型的现状，强调了该领域目前面临的最新方法和局限性。
摘要：Designing new chemical compounds with desired pharmaceutical properties is a challenging task and takes years of development and testing. Still, a majority of new drugs fail to prove efficient. Recent success of deep generative modeling holds promises of generation and optimization of new molecules. In this review paper, we provide an overview of the current generative models, and describe necessary biological and chemical terminology, including molecular representations needed to understand the field of drug design and drug response. We present commonly used chemical and biological databases, and tools for generative modeling. Finally, we summarize the current state of generative modeling for drug design and drug response prediction, highlighting the state-of-art approaches and limitations the field is currently facing.

蒸馏|知识提取(2篇)

【1】 What are the attackers doing now? Automating cyber threat intelligence extraction from text on pace with the changing threat landscape: A survey
标题：袭击者现在在做什么？根据不断变化的威胁格局自动从文本中提取网络威胁情报：一项调查
链接：https://arxiv.org/abs/2109.06808

作者：Md Rayhanur Rahman,Rezvan Mahdavi-Hezaveh,Laurie Williams
机构：Carolina State University, USA, Cyberattackers are continuously changing their strategies and techniques to bypass the security mechanisms, deployed by the targeted organizations. To thwart these attackers, cyberthreat intelligence (CTI) can help
摘要：网络安全研究人员帮助从文本来源（如威胁报告和在线文章）自动提取CTI，其中描述了网络攻击策略、程序和工具。本文的目的是通过对文献中相关研究的调查，帮助网络安全研究人员了解当前用于从文本中提取网络威胁情报的技术。我们从文献中系统地收集了“从文本中提取CTI”的相关研究，并对CTI提取目的进行了分类。我们提出了一个从这些研究中提取的CTI提取管道。我们确定了在建议的管道环境中使用的数据源、技术和CTI共享格式。我们的工作发现了十种类型的提取目的，如妥协提取的提取指标、TTP（战术、技术、攻击程序）和网络安全关键词。我们还确定了用于CTI提取的七种文本源，从黑客论坛、威胁报告、社交媒体帖子和在线新闻文章中获得的文本数据已被近90%的研究使用。自然语言处理以及有监督和无监督机器学习技术（如命名实体识别、主题建模、依赖分析、有监督分类和聚类）用于CTI提取。我们观察到与这些研究相关的技术挑战，这些挑战与获得可用的干净、标记数据有关，这些数据可以确保研究的复制、验证和进一步扩展。由于我们发现研究侧重于从文本中提取CTI信息，我们主张在当前CTI提取工作的基础上，帮助网络安全从业人员进行主动决策，如威胁优先级划分、自动威胁建模，以利用过去网络安全事件的知识。
摘要：Cybersecurity researchers have contributed to the automated extraction of CTI from textual sources, such as threat reports and online articles, where cyberattack strategies, procedures, and tools are described. The goal of this article is to aid cybersecurity researchers understand the current techniques used for cyberthreat intelligence extraction from text through a survey of relevant studies in the literature. We systematically collect "CTI extraction from text"-related studies from the literature and categorize the CTI extraction purposes. We propose a CTI extraction pipeline abstracted from these studies. We identify the data sources, techniques, and CTI sharing formats utilized in the context of the proposed pipeline. Our work finds ten types of extraction purposes, such as extraction indicators of compromise extraction, TTPs (tactics, techniques, procedures of attack), and cybersecurity keywords. We also identify seven types of textual sources for CTI extraction, and textual data obtained from hacker forums, threat reports, social media posts, and online news articles have been used by almost 90% of the studies. Natural language processing along with both supervised and unsupervised machine learning techniques such as named entity recognition, topic modelling, dependency parsing, supervised classification, and clustering are used for CTI extraction. We observe the technical challenges associated with these studies related to obtaining available clean, labelled data which could assure replication, validation, and further extension of the studies. As we find the studies focusing on CTI information extraction from text, we advocate for building upon the current CTI extraction work to help cybersecurity practitioners with proactive decision making such as threat prioritization, automated threat modelling to utilize knowledge from past cybersecurity incidents.

【2】 Exploring the Connection between Knowledge Distillation and Logits Matching
标题：探索知识提炼与日志匹配的关系
链接：https://arxiv.org/abs/2109.06458

作者：Defang Chen,Can Wang,Yan Feng,Chun Chen
机构：College of Computer Science, Zhejiang University, China.
备注：Technical Report
摘要：知识提取是一种用于模型压缩的广义logits匹配技术。它们的等价性以前是在$\textit{infinity temperature}$和$\textit{zero mean normalization}$的条件下建立的。本文证明了仅在$\textit{infinity temperature}$的情况下，知识提取的效果等于带有额外正则化的logits匹配。此外，我们还揭示了另一个较弱的条件--$\textit{equal mean initialization}$，而不是原来的$\textit{zero mean normalization}$，已经足以建立等价性。证明的关键是我们认识到，在具有交叉熵损失和softmax激活的现代神经网络中，logits上的反向传播梯度平均值始终保持为零。
摘要：Knowledge distillation is a generalized logits matching technique for model compression. Their equivalence is previously established on the condition of $\textit{infinity temperature}$ and $\textit{zero-mean normalization}$. In this paper, we prove that with only $\textit{infinity temperature}$, the effect of knowledge distillation equals to logits matching with an extra regularization. Furthermore, we reveal that an additional weaker condition -- $\textit{equal-mean initialization}$ rather than the original $\textit{zero-mean normalization}$ already suffices to set up the equivalence. The key to our proof is we realize that in modern neural networks with the cross-entropy loss and softmax activation, the mean of back-propagated gradient on logits always keeps zero.

推荐(2篇)

【1】 Sequential Modelling with Applications to Music Recommendation, Fact-Checking, and Speed Reading
标题：序贯建模及其在音乐推荐、事实核查和快速阅读中的应用
链接：https://arxiv.org/abs/2109.06736

作者：Christian Hansen
机构：Sequential Modelling with Applications to Music Rec-, ommendation, Fact-Checking, and Speed Reading, Advisors: Stephen Alstrup, Christina Lioma, Jakob Grue Simonsen, University of Copenhagen, arXiv:,.,v, [cs.IR] , Sep
备注：PhD Thesis, University of Copenhagen, Faculty of Science
摘要：序列建模需要理解序列数据，这自然发生在广泛的领域中。一个例子是与用户交互的系统，记录用户的操作和行为，并根据用户以前的交互对用户可能感兴趣的项目提出建议。在这种情况下，用户交互的顺序通常指示用户接下来感兴趣的内容。类似地，对于自动推断文本语义的系统来说，捕获句子中单词的顺序是至关重要的，因为即使是轻微的重新排序也可能显著改变其原始含义。本论文在方法论上做出了贡献，并对序列建模的具体应用领域进行了新的研究，这些领域包括向听众推荐音乐曲目的系统，以及处理文本语义以自动检查声明的系统，或“快速阅读”文本以进行有效的进一步分类的系统。（由于arXiv摘要限制，省略了摘要的其余部分）
摘要：Sequential modelling entails making sense of sequential data, which naturally occurs in a wide array of domains. One example is systems that interact with users, log user actions and behaviour, and make recommendations of items of potential interest to users on the basis of their previous interactions. In such cases, the sequential order of user interactions is often indicative of what the user is interested in next. Similarly, for systems that automatically infer the semantics of text, capturing the sequential order of words in a sentence is essential, as even a slight re-ordering could significantly alter its original meaning. This thesis makes methodological contributions and new investigations of sequential modelling for the specific application areas of systems that recommend music tracks to listeners and systems that process text semantics in order to automatically fact-check claims, or "speed read" text for efficient further classification. (Rest of abstract omitted due to arXiv abstract limit)

【2】 Simulations in Recommender Systems: An industry perspective
标题：推荐系统中的模拟：行业视角
链接：https://arxiv.org/abs/2109.06723

作者：Lucas Bernardi,Sakshi Batra,Cintia Alicia Bruscantini
备注：G pages
摘要：有效推荐系统（RS）的构建是一个复杂的过程，这主要是由于RSs的性质，它涉及大规模软件系统和人机交互。迭代开发过程需要对当前基线的深入理解，以及对多个感兴趣的变量的变化的影响进行估计的能力。模拟非常适合解决这两个挑战，并可能导致高速施工过程，这是商业环境中的基本要求。最近，人们对RS仿真平台产生了极大的兴趣，它使RS开发人员能够轻松地构建仿真环境，以便对其系统进行分析。在这项工作中，我们讨论了模拟如何帮助提高速度，我们查看了有关RS模拟平台的文献，分析了优势和差距，并提取了一套RS模拟平台设计的指导原则，我们认为这些原则将最大限度地提高迭代RS构建过程的速度。
摘要：The construction of effective Recommender Systems (RS) is a complex process, mainly due to the nature of RSs which involves large scale software-systems and human interactions. Iterative development processes require deep understanding of a current baseline as well as the ability to estimate the impact of changes in multiple variables of interest. Simulations are well suited to address both challenges and potentially leading to a high velocity construction process, a fundamental requirement in commercial contexts. Recently, there has been significant interest in RS Simulation Platforms, which allow RS developers to easily craft simulated environments where their systems can be analysed. In this work we discuss how simulations help to increase velocity, we look at the literature around RS Simulation Platforms, analyse strengths and gaps and distill a set of guiding principles for the design of RS Simulation Platforms that we believe will maximize the velocity of iterative RS construction processes.

自动驾驶|车辆|车道检测等(2篇)

【1】 Machine-Learned Prediction Equilibrium for Dynamic Traffic Assignment
标题：动态交通分配的机器学习预测均衡
链接：https://arxiv.org/abs/2109.06713

作者：Lukas Graf,Tobias Harks,Kostas Kollias,Michael Markl
机构：University of Augsburg, Google
备注：26 pages including Appendix and Figures
摘要：我们研究了一个动态流量分配模型，其中代理根据实时延迟预测做出即时路由决策。我们建立了一个数学上简洁的模型，并导出了确保动态预测平衡存在的预测因子的性质。我们展示了我们的框架的多功能性，它包含了众所周知的完整信息和瞬时信息模型，并将进一步的现实预测作为特例。我们通过一项实验研究来补充我们的理论分析，在这项实验研究中，我们系统地比较了不同预测因子的诱导平均行程时间，包括一个机器学习模型，该模型基于从先前计算的平衡流中获得的数据，在合成道路网和真实道路网上进行训练。
摘要：We study a dynamic traffic assignment model, where agents base their instantaneous routing decisions on real-time delay predictions. We formulate a mathematically concise model and derive properties of the predictors that ensure a dynamic prediction equilibrium exists. We demonstrate the versatility of our framework by showing that it subsumes the well-known full information and instantaneous information models, in addition to admitting further realistic predictors as special cases. We complement our theoretical analysis by an experimental study, in which we systematically compare the induced average travel times of different predictors, including a machine-learning model trained on data gained from previously computed equilibrium flows, both on a synthetic and a real road network.

【2】 Detecting Safety Problems of Multi-Sensor Fusion in Autonomous Driving
标题：自动驾驶中多传感器融合的安全问题检测
链接：https://arxiv.org/abs/2109.06404

作者：Ziyuan Zhong,Zhisheng Hu,Shengjian Guo,Xinyang Zhang,Zhenyu Zhong,Baishakhi Ray
机构： Columbia University, Baidu Security
摘要：近年来，自动驾驶（AD）系统蓬勃发展。通常，它们接收传感器数据，计算驾驶决策，并向车辆输出控制信号。为了消除传感器输入带来的不确定性，AD系统通常利用多传感器融合（MSF）对传感器输入进行融合，从而更可靠地了解周围环境。然而，MSF无法完全消除不确定性，因为它缺乏关于哪个传感器提供最准确数据的知识。因此，严重后果可能会意外发生。在这项工作中，我们观察到工业级高级驾驶员辅助系统（ADAS）中流行的MSF方法会误导车辆控制，并导致严重的安全隐患。无论使用何种融合方法以及至少一个传感器的准确数据，都可能发生错误行为。为了将安全危害归因于MSF方法，我们正式定义了融合错误，并提出了一种区分这些错误导致的安全违规的方法。此外，我们开发了一种新的基于进化的领域特定搜索框架FusionFuzz，用于有效检测融合错误。我们在两种广泛使用的MSF方法上评估了我们的框架在两种驾驶环境中。实验结果表明，FusionFuzz识别了150多个融合错误。最后，我们提出了一些改进MSF方法的建议。
摘要：Autonomous driving (AD) systems have been thriving in recent years. In general, they receive sensor data, compute driving decisions, and output control signals to the vehicles. To smooth out the uncertainties brought by sensor inputs, AD systems usually leverage multi-sensor fusion (MSF) to fuse the sensor inputs and produce a more reliable understanding of the surroundings. However, MSF cannot completely eliminate the uncertainties since it lacks the knowledge about which sensor provides the most accurate data. As a result, critical consequences might happen unexpectedly. In this work, we observed that the popular MSF methods in an industry-grade Advanced Driver-Assistance System (ADAS) can mislead the car control and result in serious safety hazards. Misbehavior can happen regardless of the used fusion methods and the accurate data from at least one sensor. To attribute the safety hazards to a MSF method, we formally define the fusion errors and propose a way to distinguish safety violations causally induced by such errors. Further, we develop a novel evolutionary-based domain-specific search framework, FusionFuzz, for the efficient detection of fusion errors. We evaluate our framework on two widely used MSF methods. %in two driving environments. Experimental results show that FusionFuzz identifies more than 150 fusion errors. Finally, we provide several suggestions to improve the MSF methods under study.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Tesla-Rapture: A Lightweight Gesture Recognition System from mmWave Radar Point Clouds
标题：TESLA-Rapture：一种基于毫米波雷达点云的轻量级手势识别系统
链接：https://arxiv.org/abs/2109.06448

作者：Dariush Salami,Ramin Hasibi,Sameera Palipana,Petar Popovski,Tom Michoel,Stephan Sigg
机构： Michoel are with the Department of Informatics
备注：The paper is submitted to the journal of Transactions on Mobile Computing. And it is still under review
摘要：我们介绍了Tesla Rapture，一种用于毫米波雷达生成的点云的手势识别接口。最先进的手势识别模型要么过于消耗资源，要么不够精确，无法使用可穿戴或受限设备（如物联网设备（如Raspberry PI）、XR硬件（如HoloLens）或智能手机）集成到现实生活场景中。为了解决这个问题，我们开发了Tesla，一种用于毫米波雷达点云的消息传递神经网络（MPNN）图卷积方法。该模型在降低计算复杂度和执行时间的同时，在两个数据集上的精度优于最新技术。特别是，这种方法能够预测一个手势，比最准确的竞争者快近8倍。我们在不同场景（环境、角度、距离）下的性能评估表明，特斯拉在具有挑战性的场景（如穿墙设置和极端角度的感应）中具有良好的通用性，并将精确度提高了20%。利用Tesla，我们开发了Tesla Rapture，这是一种在Raspberry PI 4上使用毫米波雷达的实时实现，并评估了其准确性和时间复杂度。我们还发布了源代码、经过训练的模型以及嵌入式设备模型的实现。
摘要：We present Tesla-Rapture, a gesture recognition interface for point clouds generated by mmWave Radars. State of the art gesture recognition models are either too resource consuming or not sufficiently accurate for integration into real-life scenarios using wearable or constrained equipment such as IoT devices (e.g. Raspberry PI), XR hardware (e.g. HoloLens), or smart-phones. To tackle this issue, we developed Tesla, a Message Passing Neural Network (MPNN) graph convolution approach for mmWave radar point clouds. The model outperforms the state of the art on two datasets in terms of accuracy while reducing the computational complexity and, hence, the execution time. In particular, the approach, is able to predict a gesture almost 8 times faster than the most accurate competitor. Our performance evaluation in different scenarios (environments, angles, distances) shows that Tesla generalizes well and improves the accuracy up to 20% in challenging scenarios like a through-wall setting and sensing at extreme angles. Utilizing Tesla, we develop Tesla-Rapture, a real-time implementation using a mmWave Radar on a Raspberry PI 4 and evaluate its accuracy and time-complexity. We also publish the source code, the trained models, and the implementation of the model for embedded devices.

联邦学习|隐私保护|加密(2篇)

【1】 Fast Federated Edge Learning with Overlapped Communication and Computation and Channel-Aware Fair Client Scheduling
标题：具有重叠通信和计算和信道感知公平客户端调度的快速联合边缘学习
链接：https://arxiv.org/abs/2109.06710

作者：Mehmet Emre Ozfatura,Junlin Zhao,Deniz Gündüz
机构：†Department of Electrical and Electronic Engineering, Imperial College London, London, UK, School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, China
备注：Accepted in IEEE SPAWC 2021
摘要：我们考虑联合衰落学习信道上的联合边缘学习（感觉），考虑到下行链路和上行链路信道延迟，以及客户端处的随机计算延迟。我们通过将通信与计算重叠来加快训练过程。通过喷泉编码的全局模型更新传输，客户端异步接收全局模型，并立即开始执行局部计算。然后，我们提出了一种称为MRTP的动态客户端调度策略，用于将本地模型更新上载到参数服务器（PS），该服务器在任何时候都以最小剩余上载时间调度客户端。然而，MRTP可能导致客户端在更新过程中有偏见地参与，从而导致非iid数据场景中的性能下降。为了克服这一问题，我们提出了两种考虑公平性的替代方案，称为年龄感知MRTP（A-MRTP）和机会公平MRTP（OF-MRTP）。在A-MRTP中，剩余的客户端根据其剩余传输时间和更新时间之间的比率进行调度，而在OF-MRTP中，选择机制利用客户端的长期平均信道速率来进一步减少延迟，同时确保客户端的公平参与。通过数值模拟表明，OF-MRTP在不牺牲测试精度的情况下显著减少了延迟。
摘要：We consider federated edge learning (FEEL) over wireless fading channels taking into account the downlink and uplink channel latencies, and the random computation delays at the clients. We speed up the training process by overlapping the communication with computation. With fountain coded transmission of the global model update, clients receive the global model asynchronously, and start performing local computations right away. Then, we propose a dynamic client scheduling policy, called MRTP, for uploading local model updates to the parameter server (PS), which, at any time, schedules the client with the minimum remaining upload time. However, MRTP can lead to biased participation of clients in the update process, resulting in performance degradation in non-iid data scenarios. To overcome this, we propose two alternative schemes with fairness considerations, termed as age-aware MRTP (A-MRTP), and opportunistically fair MRTP (OF-MRTP). In A-MRTP, the remaining clients are scheduled according to the ratio between their remaining transmission time and the update age, while in OF-MRTP, the selection mechanism utilizes the long term average channel rate of the clients to further reduce the latency while ensuring fair participation of the clients. It is shown through numerical simulations that OF-MRTP provides significant reduction in latency without sacrificing test accuracy.

【2】 Bayesian AirComp with Sign-Alignment Precoding for Wireless Federated Learning
标题：基于符号对齐预编码的贝叶斯AirComp无线联合学习
链接：https://arxiv.org/abs/2109.06579

作者：Chanho Park,Seunghoon Lee,Namyoon Lee
备注：This paper is 8 pages long, and has 4 figures. This paper is the extended version of the conference paper which is accepted in 2021 IEEE GlobeCom
摘要：在本文中，我们考虑了基于符号随机梯度下降（SDSGSD）算法的无线联合学习的问题，通过多址信道。当发送本地计算的梯度符号信息时，每个移动设备都需要应用预编码以避免无线衰落效应。然而，在实践中，在所有移动设备上获取信道状态信息（CSI）的完美知识是不可行的。在本文中，我们提出了一种简单而有效的预编码方法，称为符号对齐预编码。符号对齐预编码的思想是保护符号翻转错误不受无线衰减的影响。在局部梯度的高斯先验假设下，我们还推导了均方误差（MSE）最优聚集函数，称为贝叶斯空中计算（BayAirComp）。我们的主要发现是，与现有的预编码方法相比，使用BayAirComp聚合的一位预编码方法可以提供更好的学习性能，即使使用带有AirComp聚合的完美CSI。
摘要：In this paper, we consider the problem of wireless federated learning based on sign stochastic gradient descent (signSGD) algorithm via a multiple access channel. When sending locally computed gradient's sign information, each mobile device requires to apply precoding to circumvent wireless fading effects. In practice, however, acquiring perfect knowledge of channel state information (CSI) at all mobile devices is infeasible. In this paper, we present a simple yet effective precoding method with limited channel knowledge, called sign-alignment precoding. The idea of sign-alignment precoding is to protect sign-flipping errors from wireless fadings. Under the Gaussian prior assumption on the local gradients, we also derive the mean squared error (MSE)-optimal aggregation function called Bayesian over-the-air computation (BayAirComp). Our key finding is that one-bit precoding with BayAirComp aggregation can provide a better learning performance than the existing precoding method even using perfect CSI with AirComp aggregation.

推理|分析|理解|解释(5篇)

【1】 Variation-Incentive Loss Re-weighting for Regression Analysis on Biased Data
标题：有偏数据回归分析的变异激励损失加权
链接：https://arxiv.org/abs/2109.06565

作者：Wentai Wu,Ligang He,Weiwei Lin
机构：uk) arewith the Department of Computer Science, University of Warwick, Lin is with the School of Computer Science and Engineering, SouthChina University of Technology
备注：9 pages with appendix
摘要：分类和回归任务都容易受到训练数据有偏分布的影响。然而，现有的方法主要集中于类不平衡学习，不能应用于学习目标是连续值而不是离散标签的数值回归问题。在本文中，我们旨在通过解决模型训练期间的数据偏斜/偏差来提高回归分析的准确性。我们首先引入两个度量，唯一性和异常性，从特征（即输入）空间和目标（即输出）空间的角度反映本地化的数据分布。结合这两个指标，我们提出了一种变异激励损失重加权方法（VILoss）来优化基于梯度下降的回归分析模型训练。我们对合成数据集和真实数据集进行了全面的实验。结果表明，当使用VILoss作为训练中的损失准则时，模型质量显著提高（误差减少高达11.9%）。
摘要：Both classification and regression tasks are susceptible to the biased distribution of training data. However, existing approaches are focused on the class-imbalanced learning and cannot be applied to the problems of numerical regression where the learning targets are continuous values rather than discrete labels. In this paper, we aim to improve the accuracy of the regression analysis by addressing the data skewness/bias during model training. We first introduce two metrics, uniqueness and abnormality, to reflect the localized data distribution from the perspectives of their feature (i.e., input) space and target (i.e., output) space. Combining these two metrics we propose a Variation-Incentive Loss re-weighting method (VILoss) to optimize the gradient descent-based model training for regression analysis. We have conducted comprehensive experiments on both synthetic and real-world data sets. The results show significant improvement in the model quality (reduction in error by up to 11.9%) when using VILoss as the loss criterion in training.

【2】 Anomaly Attribution of Multivariate Time Series using Counterfactual Reasoning
标题：基于反事实推理的多变量时间序列异常属性分析
链接：https://arxiv.org/abs/2109.06562

作者：Violeta Teodora Trifunov,Maha Shadaydeh,Björn Barz,Joachim Denzler
机构：Computer Vision Group, Friedrich Schiller University Jena, Jena, Germany, rd Bj¨orn Barz, Michael Stifel Center Jena for Data-Driven and Simulation Science, German Aerospace Center (DLR), Institute for Data Science
备注：ICMLA 2021
摘要：有许多方法可以检测时间序列中的异常，但这只是理解它们的第一步。我们努力通过解释这些异常来超越这一点。因此，我们开发了一种基于反事实推理的多元时间序列归因方案。我们的目的是回答一个反事实的问题，即如果所涉及变量的子集更类似地分布到异常区间之外的数据，那么异常事件是否会发生。具体而言，我们使用最大发散区间（MDI）算法检测异常区间，用其在检测区间内的分布值替换变量子集，并通过使用MDI重新评分来观察区间是否变得不那么异常。我们根据多元时间和时空数据评估了我们的方法，并确认了我们对多个众所周知的极端气候事件（如热浪和飓风）异常归因的准确性。
摘要：There are numerous methods for detecting anomalies in time series, but that is only the first step to understanding them. We strive to exceed this by explaining those anomalies. Thus we develop a novel attribution scheme for multivariate time series relying on counterfactual reasoning. We aim to answer the counterfactual question of would the anomalous event have occurred if the subset of the involved variables had been more similarly distributed to the data outside of the anomalous interval. Specifically, we detect anomalous intervals using the Maximally Divergent Interval (MDI) algorithm, replace a subset of variables with their in-distribution values within the detected interval and observe if the interval has become less anomalous, by re-scoring it with MDI. We evaluate our method on multivariate temporal and spatio-temporal data and confirm the accuracy of our anomaly attribution of multiple well-understood extreme climate events such as heatwaves and hurricanes.

【3】 From Heatmaps to Structural Explanations of Image Classifiers
标题：从热图到图像分类器的结构解释
链接：https://arxiv.org/abs/2109.06365

作者：Li Fuxin,Zhongang Qi,Saeed Khorram,Vivswan Shitole,Prasad Tadepalli,Minsuk Kahng,Alan Fern
机构：School of EECS, Oregon State University, OR, USA, Applied Research Center (ARC), Tencent, PCG, Guangdong, China, Correspondence, Present Address, Kelley Engineering Center, Oregon, State University, Corvallis OR , USA
备注：Submitted to Applied AI Letters
摘要：本文总结了我们在过去几年中在解释图像分类器方面所做的努力，目的是包括我们所获得的负面结果和见解。本文首先描述了可解释神经网络（XNN），它试图纯粹从深层网络中提取并可视化几个高层概念，而不依赖人类的语言概念。这有助于用户理解不那么直观的网络分类，并大大提高了用户在区分不同种类海鸥这一困难的细粒度分类任务中的性能。意识到一个重要的缺失是一个可靠的热图可视化工具，我们开发了I-GOS和iGOS++，利用集成梯度避免热图生成中的局部最优，从而提高了所有分辨率的性能。在这些可视化的开发过程中，我们意识到，对于大量的图像，分类器有多条不同的路径来实现可靠的预测。这导致我们最近开发了结构化注意图（SAG），这是一种利用光束搜索为单个图像定位多个粗略热图的方法，并通过捕获图像区域的不同组合如何影响分类器的置信度来紧凑地可视化一组热图。通过研究过程，我们学到了很多关于构建深度网络解释的见解，多重解释的存在和频率，以及使解释有效的各种交易技巧。在本文中，我们试图与读者分享这些见解和观点，希望其中一些能够为未来的研究者提供关于可解释的深度学习的信息。
摘要：This paper summarizes our endeavors in the past few years in terms of explaining image classifiers, with the aim of including negative results and insights we have gained. The paper starts with describing the explainable neural network (XNN), which attempts to extract and visualize several high-level concepts purely from the deep network, without relying on human linguistic concepts. This helps users understand network classifications that are less intuitive and substantially improves user performance on a difficult fine-grained classification task of discriminating among different species of seagulls. Realizing that an important missing piece is a reliable heatmap visualization tool, we have developed I-GOS and iGOS++ utilizing integrated gradients to avoid local optima in heatmap generation, which improved the performance across all resolutions. During the development of those visualizations, we realized that for a significant number of images, the classifier has multiple different paths to reach a confident prediction. This has lead to our recent development of structured attention graphs (SAGs), an approach that utilizes beam search to locate multiple coarse heatmaps for a single image, and compactly visualizes a set of heatmaps by capturing how different combinations of image regions impact the confidence of a classifier. Through the research process, we have learned much about insights in building deep network explanations, the existence and frequency of multiple explanations, and various tricks of the trade that make explanations work. In this paper, we attempt to share those insights and opinions with the readers with the hope that some of them will be informative for future researchers on explainable deep learning.

【4】 A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space
标题：共享嵌入空间中跨语言的大规模多语言分析
链接：https://arxiv.org/abs/2109.06324

作者：Alex Jones,William Yang Wang,Kyle Mahowald
机构：Dartmouth College, University of California, Santa Barbara, University of Texas at Austin
备注：15 pages, 8 figures, EMNLP 2021
摘要：在跨语言模型中，许多不同语言的表示都存在于同一个空间中。在此，我们研究了101种语言和5050种语言对的跨语言预训练语言模型中影响句子层次对齐的语言和非语言因素。以基于BERT的LaBSE和基于BiLSTM的LASER为模型，以《圣经》为语料库，我们计算了一个基于任务的跨语言对齐度量，即双文本检索性能，以及向量空间对齐和同构的四个内在度量。然后，我们研究了一系列语言、准语言和训练相关的特征，作为这些对齐度量的潜在预测因子。我们的分析结果表明，语序一致性和形态复杂度一致性是跨语言性最强的两个语言预测因子。我们还注意到，家庭训练数据比语言训练数据更能全面预测。除了考察词序一致性对来自不同语料库的66对Zero-Shot语言对同构的影响外，我们还通过观察形态切分对英语因纽特人对齐的影响来验证我们的一些语言学发现。我们公开了我们实验的数据和代码。
摘要：In cross-lingual language models, representations for many different languages live in the same space. Here, we investigate the linguistic and non-linguistic factors affecting sentence-level alignment in cross-lingual pretrained language models for 101 languages and 5,050 language pairs. Using BERT-based LaBSE and BiLSTM-based LASER as our models, and the Bible as our corpus, we compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance, as well as four intrinsic measures of vector space alignment and isomorphism. We then examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics. The results of our analyses show that word order agreement and agreement in morphological complexity are two of the strongest linguistic predictors of cross-linguality. We also note in-family training data as a stronger predictor than language-specific training data across the board. We verify some of our linguistic findings by looking at the effect of morphological segmentation on English-Inuktitut alignment, in addition to examining the effect of word order agreement on isomorphism for 66 zero-shot language pairs from a different corpus. We make the data and code for our experiments publicly available.

【5】 Towards Better Model Understanding with Path-Sufficient Explanations
标题：通过路径充分的解释实现更好的模型理解
链接：https://arxiv.org/abs/2109.06181

作者：Ronny Luss,Amit Dhurandhar
机构：IBM Research, Yorktown Heights, NY USA
摘要：基于特征的局部归因方法是可解释人工智能（XAI）文献中最流行的方法之一。除了标准相关性之外，最近有人提出了一些方法，强调应该至少足以证明输入分类的合理性（即相关的积极因素）。尽管最小充分性是一个吸引人的特性，但由此产生的解释往往过于稀疏，以至于人类无法理解和评估模型的局部行为，因此难以判断其整体质量。为了克服这些局限性，我们提出了一种称为路径充分解释法（PSEM）的新方法，该方法对大小（或值）严格减小的给定输入输出一系列充分解释--从原始输入到最低限度的充分解释--这可以被认为是以平滑的方式跟踪模型的局部边界，从而为特定输入的局部模型行为提供更好的直觉。我们在定性和定量上验证了这些说法，并通过实验证明了PSEM在所有三种模式（图像、表格和文本）中的优势。用户研究描述了该方法在传达本地行为方面的优势，其中（许多）用户能够正确确定模型所做的预测。
摘要：Feature based local attribution methods are amongst the most prevalent in explainable artificial intelligence (XAI) literature. Going beyond standard correlation, recently, methods have been proposed that highlight what should be minimally sufficient to justify the classification of an input (viz. pertinent positives). While minimal sufficiency is an attractive property, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model, thus making it difficult to judge its overall quality. To overcome these limitations, we propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of sufficient explanations for a given input of strictly decreasing size (or value) -- from original input to a minimally sufficient explanation -- which can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input. We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across all three modalities (image, tabular and text). A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.

检测相关(1篇)

【1】 A geometric perspective on functional outlier detection
标题：函数离群点检测的几何视角
链接：https://arxiv.org/abs/2109.06849

作者：Moritz Herrmann,Fabian Scheipl
机构：Department of Statistics, Ludwig-Maximilians-University, Ludwigstr. , Munich, Germany
备注：40 pages, 20 figures
摘要：我们从几何的角度考虑功能异常检测，特别是：从函数流形中提取的功能数据集，其由数据的振幅和相位的变化模式定义。基于这个流形，我们开发了一个比以前提出的更广泛适用和现实的功能异常检测的概念。我们的理论和实验分析证明了这种观点的几个重要优点：它极大地提高了理论理解，并允许一致且全面地描述和分析复杂的功能异常场景，通过区分流形外的结构异常异常异常数据和流形上但处于边缘的分布异常数据。这提高了函数离群点检测的实际可行性：我们证明了简单的流形学习方法可以用于可靠地推断和可视化函数数据集的几何结构。我们还表明，需要表格数据输入的标准离群点检测方法可以非常成功地应用于功能数据，只需使用从流形学习方法中学习到的向量值表示作为输入特征。我们在合成和真实数据集上的实验表明，这种方法导致离群点检测性能至少与现有的功能数据特定方法在许多不同的设置中相媲美，而没有高度专业化、复杂的方法和这些方法通常需要的窄的应用领域。
摘要：We consider functional outlier detection from a geometric perspective, specifically: for functional data sets drawn from a functional manifold which is defined by the data's modes of variation in amplitude and phase. Based on this manifold, we develop a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed. Our theoretical and experimental analyses demonstrate several important advantages of this perspective: It considerably improves theoretical understanding and allows to describe and analyse complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold but at its margins. This improves practical feasibility of functional outlier detection: We show that simple manifold learning methods can be used to reliably infer and visualize the geometric structure of functional data sets. We also show that standard outlier detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as input features. Our experiments on synthetic and real data sets demonstrate that this approach leads to outlier detection performances at least on par with existing functional data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.

分类|识别(2篇)

【1】 Generatively Augmented Neural Network Watchdog for Image Classification Networks
标题：用于图像分类网络的生成式增广神经网络看门狗
链接：https://arxiv.org/abs/2109.06168

作者：Justin M. Bui,Glauco A. Amigo,Robert J. Marks II
机构：Department of Electrical and Computer Engineering, Baylor University, Waco, TX
备注：9 Pages, 22 Figures
摘要：分布外数据的识别对于分类网络的部署至关重要。例如，经过训练以区分猫狗图像的通用神经网络只能将输入分类为狗或猫。如果将汽车或金桔的图片提供给该分类器，结果仍然是狗或猫。为了缓解这种情况，已经开发了神经网络看门狗等技术。输入到自动编码器潜在层的图像压缩定义了图像空间中的分布区域。该输入数据的分布集中在图像空间中具有相应的边界。看门狗评估输入是在该边界内还是在该边界外。本文演示了如何使用生成式网络训练数据增强来锐化此边界，从而提高看门狗的识别能力和整体性能。
摘要：The identification of out-of-distribution data is vital to the deployment of classification networks. For example, a generic neural network that has been trained to differentiate between images of dogs and cats can only classify an input as either a dog or a cat. If a picture of a car or a kumquat were to be supplied to this classifier, the result would still be either a dog or a cat. In order to mitigate this, techniques such as the neural network watchdog have been developed. The compression of the image input into the latent layer of the autoencoder defines the region of in-distribution in the image space. This in-distribution set of input data has a corresponding boundary in the image space. The watchdog assesses whether inputs are in inside or outside this boundary. This paper demonstrates how to sharpen this boundary using generative network training data augmentation thereby bettering the discrimination and overall performance of the watchdog.

【2】 Specified Certainty Classification, with Application to Read Classification for Reference-Guided Metagenomic Assembly
标题：特定确定性分类及其在参考引导元基因组组装阅读分类中的应用
链接：https://arxiv.org/abs/2109.06677

作者：Alan F. Karr,Jason Hauzel,Prahlad Menon,Adam A. Porter,Marcel Schaefer
机构：Center Mid-Atlantic, Fraunhofer USA, Riverdale, MD
摘要：特定确定性分类（SCC）是一种新的分类模式，其输出带有不确定性，通常以贝叶斯后验概率的形式。通过允许分类器输出的精度低于一组原子决策中的一个，SCC允许所有决策达到指定的确定性水平，并通过检查所有可能的决策来深入了解分类器行为。我们的主要说明是参考导向基因组组装的阅读分类，但我们也通过分析新冠病毒-19疫苗接种数据来证明SCC的广度。
摘要：Specified Certainty Classification (SCC) is a new paradigm for employing classifiers whose outputs carry uncertainties, typically in the form of Bayesian posterior probabilities. By allowing the classifier output to be less precise than one of a set of atomic decisions, SCC allows all decisions to achieve a specified level of certainty, as well as provides insights into classifier behavior by examining all decisions that are possible. Our primary illustration is read classification for reference-guided genome assembly, but we demonstrate the breadth of SCC by also analyzing COVID-19 vaccination data.

表征(1篇)

【1】 Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit Translation
标题：程序到电路：利用GNN进行程序表示和电路翻译
链接：https://arxiv.org/abs/2109.06265

作者：Nan Wu,Huake He,Yuan Xie,Pan Li,Cong Hao
机构：University of California, Santa Barbara, Santa Barbara, CA, USA, Purdue University, West Lafayette, IN, USA, Georgia Institute of Technology, Atlanta, GA, USA
摘要：电路设计非常复杂，需要广泛的特定领域专业知识。硬件敏捷开发过程中的一个主要障碍是精确的电路质量评估过程相当耗时。为了在从行为语言到电路设计的转换过程中显著加快电路评估，我们将其表述为一个程序到电路的问题，旨在通过将C/C++程序表示为图形来利用图形神经网络（GNN）的表示能力。这项工作的目标有四个方面。首先，我们构建了一个包含40k C/C++程序的标准基准，每个程序都转换为具有实际硬件质量指标的电路设计，旨在促进针对这一高需求电路设计领域的有效GNN的开发。其次，分析了14种最先进的GNN模型的电路编程问题。我们确定了这个问题的关键设计挑战，应该仔细处理，但现有GNN尚未解决。目的是为设计具有适当电感偏置的GNN提供特定领域的知识。第三，我们讨论了GNN泛化评估的三套实际基准，并分析了标准程序和实际程序之间的性能差距。目标是使学习从有限的训练数据转移到现实世界的大规模电路设计问题。第四，程序到电路问题是程序到X框架中的一个代表性问题，它是一组基于程序的分析问题，具有各种下游任务。深入理解在程序到电路上应用GNN的优势和劣势，将极大地有利于整个程序到X系列。在这一方向上，我们期待GNN做出更多努力，彻底改变这一高需求的程序到电路问题，丰富GNN在程序上的表现力。
摘要：Circuit design is complicated and requires extensive domain-specific expertise. One major obstacle stuck on the way to hardware agile development is the considerably time-consuming process of accurate circuit quality evaluation. To significantly expedite the circuit evaluation during the translation from behavioral languages to circuit designs, we formulate it as a Program-to-Circuit problem, aiming to exploit the representation power of graph neural networks (GNNs) by representing C/C++ programs as graphs. The goal of this work is four-fold. First, we build a standard benchmark containing 40k C/C++ programs, each of which is translated to a circuit design with actual hardware quality metrics, aiming to facilitate the development of effective GNNs targeting this high-demand circuit design area. Second, 14 state-of-the-art GNN models are analyzed on the Program-to-Circuit problem. We identify key design challenges of this problem, which should be carefully handled but not yet solved by existing GNNs. The goal is to provide domain-specific knowledge for designing GNNs with suitable inductive biases. Third, we discuss three sets of real-world benchmarks for GNN generalization evaluation, and analyze the performance gap between standard programs and the real-case ones. The goal is to enable transfer learning from limited training data to real-world large-scale circuit design problems. Fourth, the Program-to-Circuit problem is a representative within the Program-to-X framework, a set of program-based analysis problems with various downstream tasks. The in-depth understanding of strength and weaknesses in applying GNNs on Program-to-Circuit could largely benefit the entire family of Program-to-X. Pioneering in this direction, we expect more GNN endeavors to revolutionize this high-demand Program-to-Circuit problem and to enrich the expressiveness of GNNs on programs.

优化|敛散性(2篇)

【1】 Policy Optimization Using Semiparametric Models for Dynamic Pricing
标题：基于半参数模型的动态定价策略优化
链接：https://arxiv.org/abs/2109.06368

作者：Jianqing Fan,Yongyi Guo,Mengxin Yu
备注：60 pages, 18 figures
摘要：在本文中，我们研究了背景动态定价问题，其中产品的市场价值在其观察到的特征上是线性的，加上一些市场噪声。每次只销售一个产品，并且只观察到一个表示销售成功或失败的二进制响应。我们的模型设置与Javanmard和Nazerzadeh[2019]类似，只是我们将需求曲线扩展为半参数模型，需要动态学习参数和非参数组件。我们提出了一种动态统计学习和决策策略，该策略将未知链接的广义线性模型的半参数估计与在线决策相结合，以最小化遗憾（最大化收益）。在温和的条件下，我们证明了对于市场噪声c.d.f.$f（\cdot）$和$m$阶导数（$m\geq 2$），我们的策略实现了一个遗憾的上界$\tilde{O}{d}（T^{frac{2m+1}{4m-1}$），其中$T$是时间范围，$\tilde{O}{d}$是隐藏对数项的顺序和特征$d$的维数。如果$F$是傅里叶变换指数衰减的超光滑的，则上界进一步缩减为$\tilde{O}{d}（\sqrt{T}）$。就对地平线$T$的依赖性而言，这些上界接近$\Omega（\sqrt{T}）$，即$F$属于参数类的下界。我们进一步将这些结果推广到强混合条件下具有动态相关乘积特征的情形。
摘要：In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. $F(\cdot)$ with $m$-th order derivative ($m\geq 2$), our policy achieves a regret upper bound of $\tilde{O}_{d}(T^{\frac{2m+1}{4m-1}})$, where $T$ is time horizon and $\tilde{O}_{d}$ is the order that hides logarithmic terms and the dimensionality of feature $d$. The upper bound is further reduced to $\tilde{O}_{d}(\sqrt{T})$ if $F$ is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon $T$, these upper bounds are close to $\Omega(\sqrt{T})$, the lower bound where $F$ belongs to a parametric class. We further generalize these results to the case with dynamically dependent product features under the strong mixing condition.

【2】 Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms
标题：利用无梯度优化算法自动调整TensorFlow的CPU后端
链接：https://arxiv.org/abs/2109.06266

作者：Derssie Mebratu,Niranjan Hasabnis,Pietro Mercati,Gaurit Sharma,Shamima Najnin
机构： Intel Corporation, Hillsboro, Oregon, USA, Intel Labs, Santa Clara, California, USA
备注：To appear in the Proceedings of the Machine Learning on HPC Systems (MLHPCS) workshop held in conjunction with International Supercomputing Conference (ISC), July 2, 2021
摘要：现代深度学习（DL）应用程序是使用DL库和框架（如TensorFlow和PyTorch）构建的。这些框架具有复杂的参数，调整它们以获得良好的训练和推理性能对于典型用户（如DL开发人员和数据科学家）来说是一项挑战。手动调优需要深入了解DL框架的用户可控参数以及底层硬件。这是一个缓慢而乏味的过程，通常提供次优的解决方案。在本文中，我们将调整DL框架的参数以提高训练和推理性能的问题视为一个黑盒优化问题。然后，我们研究了贝叶斯优化（BO）、遗传算法（GA）和Nelder-Mead单纯形（NMS）在调整TensorFlow CPU后端参数方面的适用性和有效性。虽然之前的工作已经研究了Nelder-Mead单纯形在类似问题中的应用，但它并没有深入了解其他更流行算法的适用性。为此，我们对在各种DL模型上调优TensorFlow的CPU后端的所有三种算法进行了系统的比较分析。我们的发现表明，贝叶斯优化在大多数模型上表现最好。然而，在某些情况下，它并不能带来最好的结果。
摘要：Modern deep learning (DL) applications are built using DL libraries and frameworks such as TensorFlow and PyTorch. These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users, such as DL developers and data scientists. Manual tuning requires deep knowledge of the user-controllable parameters of DL frameworks as well as the underlying hardware. It is a slow and tedious process, and it typically delivers sub-optimal solutions. In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box optimization problem. We then investigate applicability and effectiveness of Bayesian optimization (BO), genetic algorithm (GA), and Nelder-Mead simplex (NMS) to tune the parameters of TensorFlow's CPU backend. While prior work has already investigated the use of Nelder-Mead simplex for a similar problem, it does not provide insights into the applicability of other more popular algorithms. Towards that end, we provide a systematic comparative analysis of all three algorithms in tuning TensorFlow's CPU backend on a variety of DL models. Our findings reveal that Bayesian optimization performs the best on the majority of models. There are, however, cases where it does not deliver the best results.

预测|估计(2篇)

【1】 Predicting Loss Risks for B2B Tendering Processes
标题：预测B2B招标过程中的损失风险
链接：https://arxiv.org/abs/2109.06815

作者：Eelaaf Zahid,Yuya Jeremy Ong,Aly Megahed,Taiga Nakamura
机构：IBM Research - Almaden, San Jose, CA, USA
摘要：在许多机会中与多个客户保持销售合同的投标管道的卖家和高管可以从数据驱动的对其每个投标健康状况的洞察中获益匪浅。有许多预测模型可以提供可能性洞察，并为这些机会建立预测模型。目前，这些赢预测模型是以二元分类的形式出现的，只对赢或输的可能性进行预测。二进制公式无法提供任何洞察，说明为什么某笔交易可能会被预测为亏损。本文提供了一个多类别分类模型来预测获胜概率，其中三个损失类别提供了预测损失的具体原因，包括没有出价、客户没有追求以及在竞争中失败。鉴于预测的性质，这些类别提供了如何处理该机会的指标。除了提供多类别分类的基线结果外，本文还提供了类别不平衡处理后的模型结果，结果实现了85%的高准确率和0.94的平均AUC分数。
摘要：Sellers and executives who maintain a bidding pipeline of sales engagements with multiple clients for many opportunities significantly benefit from data-driven insight into the health of each of their bids. There are many predictive models that offer likelihood insights and win prediction modeling for these opportunities. Currently, these win prediction models are in the form of binary classification and only make a prediction for the likelihood of a win or loss. The binary formulation is unable to offer any insight as to why a particular deal might be predicted as a loss. This paper offers a multi-class classification model to predict win probability, with the three loss classes offering specific reasons as to why a loss is predicted, including no bid, customer did not pursue, and lost to competition. These classes offer an indicator of how that opportunity might be handled given the nature of the prediction. Besides offering baseline results on the multi-class classification, this paper also offers results on the model after class imbalance handling, with the results achieving a high accuracy of 85% and an average AUC score of 0.94.

【2】 Tuna-AI: tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data
标题：金枪鱼-AI：基于海洋学和回声测深仪FAD数据训练的机器学习模型的金枪鱼生物量估计
链接：https://arxiv.org/abs/2109.06732

作者：Daniel Precioso,Manuel Navarro-García,Kathryn Gavira-O'Neill,Alberto Torres-Barrán,David Gordo,Victor Gallego-Alcalá,David Gómez-Ullate
机构：Department of Computer Science, Higher School of Engineering, Universidad de C´adiz, Spain, Manuel Navarro-Garc´ıa, Universidad Carlos III de Madrid, Spain, Komorebi AI Technologies, Madrid, Spain, Kathryn Gavira-O’Neill, Satlink, Alberto Torres-Barr´an
摘要：漂浮FAD浮标记录的回声测深仪数据提供了关于金枪鱼种群及其行为的非常有价值的信息来源。当这些数据补充来自CMEMS的海洋学数据时，该值增加。我们利用这些资源开发了金枪鱼人工智能，这是一种机器学习模型，旨在预测给定浮标下的金枪鱼生物，该模型使用为期3天的回声测深仪数据窗口来捕捉金枪鱼群体的日常时空模式特征。作为训练的监督信号，我们使用了5000多个集合项目，以及AGAC金枪鱼围网船队报告的相应金枪鱼捕获量。
摘要：Echo-sounder data registered by buoys attached to drifting FADs provide a very valuablesource of information on populations of tuna and their behaviour. This value increases whenthese data are supplemented with oceanographic data coming from CMEMS. We use thesesources to develop Tuna-AI, a Machine Learning model aimed at predicting tuna biomassunder a given buoy, which uses a 3-day window of echo-sounder data to capture the dailyspatio-temporal patterns characteristic of tuna schools. As the supervised signal for training,we employ more than5000set events with their corresponding tuna catch reported by theAGAC tuna purse seine fleet.

其他神经网络|深度学习|模型|建模(15篇)

【1】 Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
标题：Greenform：高效深度神经网络的因式分解工具包
链接：https://arxiv.org/abs/2109.06762

作者：Samuel Cahyawijaya,Genta Indra Winata,Holy Lovenia,Bryan Wilie,Wenliang Dai,Etsuko Ishii,Pascale Fung
机构：Center for Artificial Intelligence Research (CAiRE), The Hong Kong University of Science and Technology
摘要：虽然深度神经网络（DNN）的最新进展取得了显著的成功，但计算成本也显著增加。在本文中，我们介绍了Greenformer，这是一个工具包，通过矩阵分解来加速神经网络的计算，同时保持性能。Greenformer只需一行代码即可轻松应用于任何DNN模型。我们的实验结果表明，Greenformer在很多情况下都是有效的。我们在https://samuelcahyawijaya.github.io/greenformer-demo/.
摘要：While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably. In this paper, we introduce Greenformer, a toolkit to accelerate the computation of neural networks through matrix factorization while maintaining performance. Greenformer can be easily applied with a single line of code to any DNN model. Our experimental results show that Greenformer is effective for a wide range of scenarios. We provide the showcase of Greenformer at https://samuelcahyawijaya.github.io/greenformer-demo/.

【2】 Comparing Reconstruction- and Contrastive-based Models for Visual Task Planning
标题：基于重构和对比的可视化任务规划模型比较
链接：https://arxiv.org/abs/2109.06737

作者：Constantinos Chamzas,Martina Lippi,Michael C. Welle,Anastasia Varava,Lydia E. Kavraki,Danica Kragic
备注：for the associated project web page, see this https URL
摘要：学习状态表示可以直接从原始观察（如图像）进行机器人规划。大多数方法通过利用低维潜在空间中原始观测值的重建，利用损失来学习状态表示。通常假设图像空间中观测值之间的相似性，并将其用作估计系统基本状态之间相似性的代理。然而，观测通常包含与任务无关的变化因素，这些因素对于重建仍然很重要，例如不同的照明和不同的相机视点。在这项工作中，我们定义了相关的评估指标，并对状态表示学习的不同损失函数进行了深入研究。我们发现，在视觉任务规划中，利用任务优先级的模型，如具有简单对比损失的暹罗网络，优于基于重建的表示。
摘要：Learning state representations enables robotic planning directly from raw observations such as images. Most methods learn state representations by utilizing losses based on the reconstruction of the raw observations from a lower-dimensional latent space. The similarity between observations in the space of images is often assumed and used as a proxy for estimating similarity between the underlying states of the system. However, observations commonly contain task-irrelevant factors of variation which are nonetheless important for reconstruction, such as varying lighting and different camera viewpoints. In this work, we define relevant evaluation metrics and perform a thorough study of different loss functions for state representation learning. We show that models exploiting task priors, such as Siamese networks with a simple contrastive loss, outperform reconstruction-based representations in visual task planning.

【3】 Learning Density Distribution of Reachable States for Autonomous Systems
标题：自治系统可达状态的学习密度分布
链接：https://arxiv.org/abs/2109.06728

作者：Yue Meng,Dawei Sun,Zeng Qiu,Md Tawhid Bin Waez,Chuchu Fan
机构：MIT, United States, UIUC, Ford
备注：Accepted at CoRL 2021
摘要：与最坏情况可达性相比，状态密度分布可用于安全相关问题，以更好地量化潜在危险情况的风险可能性。在这项工作中，我们提出了一种数据驱动的方法来计算非线性甚至黑箱系统可达态的密度分布。我们的半监督方法从轨迹数据中联合学习系统动力学和状态密度，受状态密度演化遵循Liouville偏微分方程这一事实的指导。借助神经网络可达性工具，我们的方法可以估计所有可能的未来状态集及其密度。此外，我们可以在不安全行为发生的概率范围内进行在线安全验证。我们使用了大量的实验来证明，我们学习的解决方案可以对密度分布产生更准确的估计，并且与最坏情况分析相比，可以更不保守和灵活地量化风险。
摘要：State density distribution, in contrast to worst-case reachability, can be leveraged for safety-related problems to better quantify the likelihood of the risk for potentially hazardous situations. In this work, we propose a data-driven method to compute the density distribution of reachable states for nonlinear and even black-box systems. Our semi-supervised approach learns system dynamics and the state density jointly from trajectory data, guided by the fact that the state density evolution follows the Liouville partial differential equation. With the help of neural network reachability tools, our approach can estimate the set of all possible future states as well as their density. Moreover, we could perform online safety verification with probability ranges for unsafe behaviors to occur. We use an extensive set of experiments to show that our learned solution can produce a much more accurate estimate on density distribution, and can quantify risks less conservatively and flexibly comparing with worst-case analysis.

【4】 Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks
标题：卷积网络的可学习离散小波池(LDW-Pooling)
链接：https://arxiv.org/abs/2109.06638

作者：Jun-Wei Hsieh,Ming-Ching Chang,Ping-Yang Chen,Bor-Shiun Wang,Lipeng Ke,Siwei Lyu
机构：College of Artificial Intelligence and Green Energy, National Yang Ming Chiao Tung University, University at Albany - SUNY, Department of Computer Science, University at Buffalo
摘要：池是现代深层CNN体系结构中用于特征聚合和提取的一个简单但重要的层。典型的CNN设计侧重于conv层和激活功能，而池层的选项较少。我们介绍了学习离散小波池（LDW池），它可以普遍应用于取代标准的池操作，以更好地提取特征，提高准确性和效率。受小波理论的启发，我们在水平和垂直方向上采用低通（L）和高通（H）滤波器来汇集二维特征图。将特征信号分解为四个子带（LL、LH、HL、HH），以更好地保留特征并避免信息丢失。小波变换确保合并后的特征能够得到充分保留和恢复。接下来，我们采用基于能量的注意力学习来精细选择关键的和有代表性的特征。与其他最先进的池技术（如小波池和提升池）相比，LDW池是有效的。大量的实验验证表明，LDW池可以应用于广泛的标准CNN体系结构，并且始终优于标准（最大、平均、混合和随机）池操作。
摘要：Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to retain features better and avoid information dropping. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures and consistently outperform standard (max, mean, mixed, and stochastic) pooling operations.

【5】 Statistical limits of dictionary learning: random matrix theory and the spectral replica method
标题：字典学习的统计极限：随机矩阵理论和谱复制方法
链接：https://arxiv.org/abs/2109.06610

作者：Jean Barbier,Nicolas Macris
摘要：我们认为越来越复杂的模型的矩阵去噪和字典学习在贝叶斯最优设置，在挑战性的制度，推断矩阵有一个排名线性增长的系统规模。这与大多数关于低等级（即恒定等级）制度的现有文献形成对比。我们首先考虑一类旋转不变矩阵去噪问题，其互信息和最小均方误差可用随机矩阵理论的标准技术计算。接下来，我们分析更具挑战性的词典学习模式。为此，我们将统计力学中的复型方法和随机矩阵理论相结合，提出了谱复型方法。它允许我们推测隐藏表示和噪声数据之间的互信息以及量化最佳重建误差的重叠的变分公式。所提出的方法将自由度从$\Theta（N^2）$（矩阵项）减少到$\Theta（N）$（特征值或奇异值），并产生互信息的库仑气体表示，使人想起物理学中的矩阵模型。主要成分是在某些重叠矩阵的特征值（或奇异值）的概率分布水平上，使用HarishChandra-Itzykson-Zuber球面积分与新的副本对称解耦ansatz相结合。
摘要：We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising problems whose mutual information and minimum mean-square error are computable using standard techniques from random matrix theory. Next, we analyze the more challenging models of dictionary learning. To do so we introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method. It allows us to conjecture variational formulas for the mutual information between hidden representations and the noisy data as well as for the overlaps quantifying the optimal reconstruction error. The proposed methods reduce the number of degrees of freedom from $\Theta(N^2)$ (matrix entries) to $\Theta(N)$ (eigenvalues or singular values), and yield Coulomb gas representations of the mutual information which are reminiscent of matrix models in physics. The main ingredients are the use of HarishChandra-Itzykson-Zuber spherical integrals combined with a new replica symmetric decoupling ansatz at the level of the probability distributions of eigenvalues (or singular values) of certain overlap matrices.

【6】 Sum-Product-Attention Networks: Leveraging Self-Attention in Probabilistic Circuits
标题：和-积-注意网络：在概率电路中利用自我注意
链接：https://arxiv.org/abs/2109.06587

作者：Zhongjie Yu,Devendra Singh Dhami,Kristian Kersting
机构：Department of Computer Science, TU Darmstadt, Darmstadt, Germany, Centre for Cognitive Science, TU Darmstadt, and Hessian Center for AI (hessian.AI)
摘要：概率电路已经成为概率建模中学习和推理的事实标准。我们介绍了和积注意网络（SPAN），这是一种新的生成模型，将概率电路与Transformer相结合。SPAN使用自我关注来选择概率电路中最相关的部分，这里是和积网络，以提高底层和积网络的建模能力。我们表明，在建模时，SPAN将重点放在和积网络的每个积层中的一组特定的独立假设上。我们的实证评估表明，SPAN在各种基准数据集上的表现优于最先进的概率生成模型，也是一种高效的生成图像模型。
摘要：Probabilistic circuits (PCs) have become the de-facto standard for learning and inference in probabilistic modeling. We introduce Sum-Product-Attention Networks (SPAN), a new generative model that integrates probabilistic circuits with Transformers. SPAN uses self-attention to select the most relevant parts of a probabilistic circuit, here sum-product networks, to improve the modeling capability of the underlying sum-product network. We show that while modeling, SPAN focuses on a specific set of independent assumptions in every product layer of the sum-product network. Our empirical evaluations show that SPAN outperforms state-of-the-art probabilistic generative models on various benchmark data sets as well is an efficient generative image model.

【7】 A Machine-learning Framework for Acoustic Design Assessment in Early Design Stages
标题：声学设计早期评估的机器学习框架
链接：https://arxiv.org/abs/2109.06459

作者：Reyhane Abarghooie,Zahra Sadat Zomorodian,Mohammad Tahsildoost,Zohreh Shaghaghian
机构：Shahid Beheshti University, Tehran, Iran, Texas A&M University, College Station, United States
摘要：在时间-成本比例模型研究中，使用模拟方法预测声学性能是首选的常用方法。在这一领域，声学仿真工具的构建面临着诸多挑战，包括声学工具的高成本、对声学专业知识的需求以及耗时的声学仿真过程。本项目的目标是引入一个计算时间短的简单模型，以在建筑早期设计阶段估算房间声学条件。本文介绍了一种新的机器学习方法（ML）的工作原型，该方法仅使用几何数据作为输入特征来逼近一系列典型的房间声学参数。一个由2916个不同配置的单个房间的声学模拟组成的新数据集用于训练和测试所提出的模型。在模拟过程中，使用Pachydrm声学软件分析了房间尺寸、窗户尺寸、材料吸收系数、家具和阴影类型等特征。所述数据集用作七个基于全连接深度神经网络（DNN）的机器学习模型的输入。ML模型的平均误差在1%到3%之间，验证过程后新预测样本的平均误差在2%到12%之间。
摘要：In time-cost scale model studies, predicting acoustic performance by using simulation methods is a commonly used method that is preferred. In this field, building acoustic simulation tools are complicated by several challenges, including the high cost of acoustic tools, the need for acoustic expertise, and the time-consuming process of acoustic simulation. The goal of this project is to introduce a simple model with a short calculation time to estimate the room acoustic condition in the early design stages of the building. This paper presents a working prototype for a new method of machine learning (ML) to approximate a series of typical room acoustic parameters using only geometric data as input characteristics. A novel dataset consisting of acoustical simulations of a single room with 2916 different configurations are used to train and test the proposed model. In the stimulation process, features that include room dimensions, window size, material absorption coefficient, furniture, and shading type have been analysed by using Pachyderm acoustic software. The mentioned dataset is used as the input of seven machine-learning models based on fully connected Deep Neural Networks (DNN). The average error of ML models is between 1% to 3%, and the average error of the new predicted samples after the validation process is between 2% to 12%.

【8】 A machine-learning framework for daylight and visual comfort assessment in early design stages
标题：一种用于早期设计阶段日光和视觉舒适性评估的机器学习框架
链接：https://arxiv.org/abs/2109.06450

作者：Hanieh Nourkojouri,Zahra Sadat Zomorodian,Mohammad Tahsildoost,Zohreh Shaghaghian
机构：Shahid Beheshti University, Tehran, Iran, Texas A&M University, College Station, United States
备注：This paper was presented at the 2021 IBPSA (International Building Performance Simulation Association) Conference in Bruges, Belgium
摘要：本研究主要集中于评估机器学习算法在早期设计阶段预测日光和视觉舒适度指标。一个数据集主要是从2880个来自蜜蜂对蚱蜢的模拟中开发出来的。模拟是针对一个带有一侧窗户的鞋盒空间进行的。这些替代方案来自不同的物理特征，包括房间尺寸、内表面反射率、窗户尺寸和方向、窗户数量和着色状态。5个指标用于日光评估，包括UDI、sDA、mDA、ASE和sVD。通过基于grasshopper的算法对同一鞋盒空间的质量视图进行分析，该算法是根据LEED v4质量视图评估框架开发的。使用Python编写的人工神经网络算法进一步分析数据集。预测的准确率平均估计为97%。开发的模型可用于早期设计阶段分析，而无需在以前使用的平台和程序中进行耗时的模拟。
摘要：This research is mainly focused on the assessment of machine learning algorithms in the prediction of daylight and visual comfort metrics in the early design stages. A dataset was primarily developed from 2880 simulations derived from Honeybee for Grasshopper. The simulations were done for a shoebox space with a one side window. The alternatives emerged from different physical features, including room dimensions, interior surfaces reflectance, window dimensions and orientations, number of windows, and shading states. 5 metrics were used for daylight evaluations, including UDI, sDA, mDA, ASE, and sVD. Quality Views were analyzed for the same shoebox spaces via a grasshopper-based algorithm, developed from the LEED v4 evaluation framework for Quality Views. The dataset was further analyzed with an Artificial Neural Network algorithm written in Python. The accuracy of the predictions was estimated at 97% on average. The developed model could be used in early design stages analyses without the need for time-consuming simulations in previously used platforms and programs.

【9】 Neural Networks with Physics-Informed Architectures and Constraints for Dynamical Systems Modeling
标题：具有物理信息结构和约束的神经网络用于动态系统建模
链接：https://arxiv.org/abs/2109.06407

作者：Franck Djeumou,Cyrus Neary,Eric Goubault,Sylvie Putot,Ufuk Topcu
机构： The University of Texas at Austin, United States, LIX, CNRS, ´Ecole Polytechnique, Institut Polytechnique de Paris, France
摘要：将基于物理的知识有效地融入动力系统的深层神经网络模型中，可以极大地提高数据效率和泛化能力。这种先验知识可能来自物理原理（如守恒定律）或系统设计（如机器人的雅可比矩阵），即使大部分系统动力学仍然未知。我们开发了一个从轨迹数据学习动力学模型的框架，同时将先验系统知识作为归纳偏差。更具体地说，该框架使用基于物理的边信息来通知神经网络本身的结构，并对输出值和模型的内部状态施加约束。它将系统的向量场表示为已知函数和未知函数的组合，后者由神经网络参数化。在模型训练过程中，通过增广拉格朗日方法强制执行物理信息约束。我们通过实验证明了所提出的方法在各种动力系统上的优势——包括一套具有大状态空间、非线性动力学、外力、接触力和控制输入的机器人基准环境。通过在训练过程中利用先验系统知识，与不包含先验知识的基线方法相比，在给定相同训练数据集的情况下，该方法学习预测系统动力学两个数量级。
摘要：Effective inclusion of physics-based knowledge into deep neural network models of dynamical systems can greatly improve data efficiency and generalization. Such a-priori knowledge might arise from physical principles (e.g., conservation laws) or from the system's design (e.g., the Jacobian matrix of a robot), even if large portions of the system dynamics remain unknown. We develop a framework to learn dynamics models from trajectory data while incorporating a-priori system knowledge as inductive bias. More specifically, the proposed framework uses physics-based side information to inform the structure of the neural network itself, and to place constraints on the values of the outputs and the internal states of the model. It represents the system's vector field as a composition of known and unknown functions, the latter of which are parametrized by neural networks. The physics-informed constraints are enforced via the augmented Lagrangian method during the model's training. We experimentally demonstrate the benefits of the proposed approach on a variety of dynamical systems -- including a benchmark suite of robotics environments featuring large state spaces, non-linear dynamics, external forces, contact forces, and control inputs. By exploiting a-priori system knowledge during training, the proposed approach learns to predict the system dynamics two orders of magnitude more accurately than a baseline approach that does not include prior knowledge, given the same training dataset.

【10】 MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks
标题：Mindcraft：协作任务中情境对话的心智理论建模
链接：https://arxiv.org/abs/2109.06275

作者：Cristian-Paul Bara,Sky CH-Wang,Joyce Chai
机构：University of Michigan, Columbia University
摘要：在人类世界中，自治代理的理想集成意味着它们能够根据人类的条件进行协作。特别是，心理理论在人类合作和交流中起着重要的作用。为了实现情境交互中的思维建模理论，我们引入了一个细粒度的数据集，该数据集由3D虚拟区块世界中的一对人类主体执行协作任务。随着互动的展开，它提供了捕捉合作伙伴对世界和彼此的信念的信息，为研究情境语言交际中的人类协作行为提供了大量机会。作为实现我们的目标的第一步，即开发能够在现场推断合作伙伴信念状态的具体人工智能代理，我们构建并展示了几种心理理论任务的计算模型的结果。
摘要：An ideal integration of autonomous agents in a human world implies that they are able to collaborate on human terms. In particular, theory of mind plays an important role in maintaining common ground during human collaboration and communication. To enable theory of mind modeling in situated interactions, we introduce a fine-grained dataset of collaborative tasks performed by pairs of human subjects in the 3D virtual blocks world of Minecraft. It provides information that captures partners' beliefs of the world and of each other as an interaction unfolds, bringing abundant opportunities to study human collaborative behaviors in situated language communication. As a first step towards our goal of developing embodied AI agents able to infer belief states of collaborative partners in situ, we build and present results on computational models for several theory of mind tasks.

【11】 Machine Learning for Online Algorithm Selection under Censored Feedback
标题：机器学习在删失反馈下的在线算法选择
链接：https://arxiv.org/abs/2109.06234

作者：Alexander Tornede,Viktor Bengs,Eyke Hüllermeier
机构：Department of Computer Science, Paderborn University, Institute for Informatics, LMU Munich
摘要：在在线算法选择（OAS）中，算法问题类的实例一个接一个地呈现给代理，代理必须从一组固定的候选算法中快速选择一个可能最好的算法。对于诸如可满足性（SAT）之类的决策问题，质量通常是指算法的运行时间。由于后者表现出重尾分布，当超过预定义的时间上限时，算法通常停止。因此，用于以数据驱动方式优化算法选择策略的机器学习方法需要处理右截尾样本，这是一个迄今为止文献中很少关注的问题。在这项工作中，我们回顾了OAS中的多武装bandit算法，并讨论了它们处理该问题的能力。此外，我们使它们适应面向运行时的损失，允许部分删失数据，同时保持空间和时间复杂性独立于时间范围。在对ASlib基准的修改版本进行的广泛实验评估中，我们证明了基于Thompson抽样的理论基础良好的方法与现有方法相比具有特别强的性能和改进性。
摘要：In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. As the latter is known to exhibit a heavy-tail distribution, an algorithm is normally stopped when exceeding a predefined upper time limit. As a consequence, machine learning methods used to optimize an algorithm selection strategy in a data-driven manner need to deal with right-censored samples, a problem that has received little attention in the literature so far. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. Moreover, we adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon. In an extensive experimental evaluation on an adapted version of the ASlib benchmark, we demonstrate that theoretically well-founded methods based on Thompson sampling perform specifically strong and improve in comparison to existing methods.

【12】 Performance of a Markovian neural network versus dynamic programming on a fishing control problem
标题：马尔可夫神经网络与动态规划在渔控问题上的性能比较
链接：https://arxiv.org/abs/2109.06856

作者：Mathieu Laurière,Gilles Pagès,Olivier Pironneau
机构： Sorbonne Universit´e, sorbonne-universite
摘要：捕鱼配额虽然令人不快，但对于控制渔场的生产力来说却是有效的。一个流行的模型有一个生物量的随机微分方程，在这个方程上，可以使用随机动态规划或Hamilton-Jacobi-Bellman算法来寻找随机控制——捕捞配额。我们将动态规划得到的解与保留解的马尔可夫性质的神经网络得到的解进行比较。将该方法推广到一个相似的多种群模型，以检验其高维鲁棒性。
摘要：Fishing quotas are unpleasant but efficient to control the productivity of a fishing site. A popular model has a stochastic differential equation for the biomass on which a stochastic dynamic programming or a Hamilton-Jacobi-Bellman algorithm can be used to find the stochastic control -- the fishing quota. We compare the solutions obtained by dynamic programming against those obtained with a neural network which preserves the Markov property of the solution. The method is extended to a similar multi species model to check its robustness in high dimension.

【13】 Neural Upscaling from Residue-level Protein Structure Networks to Atomistic Structure
标题：从残基水平蛋白质结构网络到原子结构的神经升级换代
链接：https://arxiv.org/abs/2109.06700

作者：Vy Duong,Elizabeth Diessner,Gianmarc Grazioli,Rachel W. Martin,Carter T. Butts
机构：Preprint version, Date: ,,, Department of Chemistry, UC Irvine, Department of Chemistry, San Jose State University, Department of Molecular Biology & Biochemistry, UC Irvine
摘要：粗粒化是扩展蛋白质和其他生物大分子动力学模型范围的有力工具。拓扑粗粒化，其中生物分子或其集合通过图形结构表示，是获得分子结构的高度压缩表示的一种特别有用的方法，并且通过这种表示操作的模拟可以实现大量的计算节省。然而，粗粒化的一个缺点是原子细节的丢失——对于蛋白质结构网络（PSN）等拓扑表示而言，这种影响尤其严重。在这里，我们介绍了一种基于机器学习和物理引导细化相结合的方法，用于从PSN推断原子坐标。这种“神经放大”程序利用了PSN对可能配置的约束，以及使用相同PSN观察不同配置的可能性差异。我们使用1$\mu$s原子分子动力学轨道的$\beta_{1-40}$，表明神经放大能够有效地重现内在无序蛋白质的详细结构信息，特别是在恢复瞬时二级结构等特征方面取得了成功。这些结果表明，基于可伸缩网络的蛋白质结构和动力学模型可用于需要原子细节的环境中，并使用放大从PSN中输入原子坐标。
摘要：Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly compressed representations of molecular structure, and simulations operating via such representations can achieve substantial computational savings. A drawback of coarse-graining, however, is the loss of atomistic detail - an effect that is especially acute for topological representations such as protein structure networks (PSNs). Here, we introduce an approach based on a combination of machine learning and physically-guided refinement for inferring atomic coordinates from PSNs. This "neural upscaling" procedure exploits the constraints implied by PSNs on possible configurations, as well as differences in the likelihood of observing different configurations with the same PSN. Using a 1 $\mu$s atomistic molecular dynamics trajectory of A$\beta_{1-40}$, we show that neural upscaling is able to effectively recapitulate detailed structural information for intrinsically disordered proteins, being particularly successful in recovering features such as transient secondary structure. These results suggest that scalable network-based models for protein structure and dynamics may be used in settings where atomistic detail is desired, with upscaling employed to impute atomic coordinates from PSNs.

【14】 Deep Convolutional Generative Modeling for Artificial Microstructure Development of Aluminum-Silicon Alloy
标题：铝硅合金人工组织发展的深层卷积产生式建模
链接：https://arxiv.org/abs/2109.06635

作者：Akshansh Mishra,Tarushi Pathak
机构：Systems, Stir, Research, Technologies, SRM Institute of Science and Technology, Kattangulathur , India. Email:, © The Authors. Published by Lattice Science Publication (LSP). This is an, open, access, article, under, the, CC, BY-NC-ND, license
备注：None
摘要：机器学习是人工智能的一个子领域，在制造业和材料科学领域有着广泛的应用。在本研究中，深度生成建模（deepgenerativemodeling）是一种无监督机器学习技术，适用于构建铝硅合金的人工微结构。深度生成对抗网络已用于开发给定微观结构图像数据集的人工微观结构。获得的结果表明，开发的模型已学会在微观结构的特定图像附近复制衬砌。
摘要：Machine learning which is a sub-domain of an Artificial Intelligence which is finding various applications in manufacturing and material science sectors. In the present study, Deep Generative Modeling which a type of unsupervised machine learning technique has been adapted for the constructing the artificial microstructure of Aluminium-Silicon alloy. Deep Generative Adversarial Networks has been used for developing the artificial microstructure of the given microstructure image dataset. The results obtained showed that the developed models had learnt to replicate the lining near the certain images of the microstructures.

【15】 On the regularized risk of distributionally robust learning over deep neural networks
标题：关于深度神经网络上分布鲁棒学习的正则化风险
链接：https://arxiv.org/abs/2109.06294

作者：Camilo Garcia Trillos,Nicolas Garcia Trillos
摘要：在本文中，我们探讨了分布式鲁棒学习和不同形式的正则化之间的关系，以增强深层神经网络的鲁棒性。特别地，我们从一个具体的最小-最大分布鲁棒问题出发，利用最优运输理论的工具，根据适当的正则化风险最小化问题，导出了分布鲁棒问题的一阶和二阶近似。在深ResNet模型的背景下，我们将由此产生的正则化问题的结构识别为平均场最优控制问题，其中状态变量的数量和维数在原始非鲁棒问题维数的无量纲因子内。利用与这些问题相关的Pontryagin极大值原理，我们提出了一系列用于训练鲁棒神经网络的可伸缩算法。我们的分析恢复了文献中已知的一些结果和算法（在本文解释的环境中），并提供了许多我们所知的新颖的理论和算法见解。在我们的分析中，我们使用了我们认为有用的工具，用于将来分析更一般的对抗性学习问题。
摘要：In this paper we explore the relation between distributionally robust learning and different forms of regularization to enforce robustness of deep neural networks. In particular, starting from a concrete min-max distributionally robust problem, and using tools from optimal transport theory, we derive first order and second order approximations to the distributionally robust problem in terms of appropriate regularized risk minimization problems. In the context of deep ResNet models, we identify the structure of the resulting regularization problems as mean-field optimal control problems where the number and dimension of state variables is within a dimension-free factor of the dimension of the original unrobust problem. Using the Pontryagin maximum principles associated to these problems we motivate a family of scalable algorithms for the training of robust neural networks. Our analysis recovers some results and algorithms known in the literature (in settings explained throughout the paper) and provides many other theoretical and algorithmic insights that to our knowledge are novel. In our analysis we employ tools that we deem useful for a future analysis of more general adversarial learning problems.

其他(18篇)

【1】 Nonlinearities in Steerable SO(2)-Equivariant CNNs
标题：可操控SO(2)-等变CNN的非线性
链接：https://arxiv.org/abs/2109.06861

作者：Daniel Franzen,Michael Wand
机构：Institute of Computer Science, Johannes-Gutenberg University Mainz, Staudingerweg , Mainz, Germany
摘要：对称不变性是机器学习中的一个重要问题。我们的论文特别关注输入变换产生输出同态变换的等变神经网络。在这里，可控CNN已成为标准解决方案。可控制表示的一个固有问题是，一般的非线性层破坏了等变性，从而限制了体系结构的选择。本文应用谐波失真分析阐明了非线性对SO（2）傅里叶表示的影响。我们开发了一种新的基于FFT的算法，用于在保持频带限制的情况下计算非线性变换激活的表示。它产生了多项式（近似）非线性的精确等变，以及一般函数精度可调的近似解。我们应用该方法为采样的三维表面数据构建了一个完全E（3）-等变网络。在2D和3D数据的实验中，我们得到的结果在精度方面优于最新技术，同时允许连续对称和精确等变。
摘要：Invariance under symmetry is an important problem in machine learning. Our paper looks specifically at equivariant neural networks where transformations of inputs yield homomorphic transformations of outputs. Here, steerable CNNs have emerged as the standard solution. An inherent problem of steerable representations is that general nonlinear layers break equivariance, thus restricting architectural choices. Our paper applies harmonic distortion analysis to illuminate the effect of nonlinearities on Fourier representations of SO(2). We develop a novel FFT-based algorithm for computing representations of non-linearly transformed activations while maintaining band-limitation. It yields exact equivariance for polynomial (approximations of) nonlinearities, as well as approximate solutions with tunable accuracy for general functions. We apply the approach to build a fully E(3)-equivariant network for sampled 3D surface data. In experiments with 2D and 3D data, we obtain results that compare favorably to the state-of-the-art in terms of accuracy while permitting continuous symmetry and exact equivariance.

【2】 Types of Out-of-Distribution Texts and How to Detect Them
标题：超发文本的类型及其检测方法
链接：https://arxiv.org/abs/2109.06827

作者：Udit Arora,William Huang,He He
机构：♠New York University, ♣Capital One
备注：EMNLP 2021
摘要：尽管对检测分布外（OOD）示例的重要性达成了一致，但对于OOD示例的正式定义以及如何最好地检测它们，几乎没有共识。我们根据这些例子是否表现出背景变化或语义变化对其进行分类，并发现OOD检测的两种主要方法，模型校准和密度估计（文本语言建模），在这些类型的OOD数据上具有不同的行为。在14对分布内和OOD英语自然语言理解数据集中，我们发现密度估计方法在背景移位设置中始终优于校准方法，而在语义移位设置中表现更差。此外，我们发现这两种方法通常无法从挑战数据中检测到示例，这突出了当前方法的一个弱点。由于没有一种方法能够在所有设置中都很好地工作，因此我们的结果要求在评估不同的检测方法时明确定义OOD示例。
摘要：Despite agreement on the importance of detecting out-of-distribution (OOD) examples, there is little consensus on the formal definition of OOD examples and how to best detect them. We categorize these examples by whether they exhibit a background shift or a semantic shift, and find that the two major approaches to OOD detection, model calibration and density estimation (language modeling for text), have distinct behavior on these types of OOD data. Across 14 pairs of in-distribution and OOD English natural language understanding datasets, we find that density estimation methods consistently beat calibration methods in background shift settings, while performing worse in semantic shift settings. In addition, we find that both methods generally fail to detect examples from challenge data, highlighting a weak spot for current methods. Since no single method works well across all settings, our results call for an explicit definition of OOD examples when evaluating different detection methods.

【3】 Multiple shooting with neural differential equations
标题：用神经微分方程进行多次射击
链接：https://arxiv.org/abs/2109.06786

作者：Evren Mert Turan,Johannes Jäschke
机构：NorwegianUniversity of Science and Technology (NTNU)
摘要：最近，神经微分方程作为一种灵活的数据驱动/混合方法出现在时间序列数据建模中。这项工作通过实验证明，如果数据包含振荡，那么神经微分方程的标准拟合可能会给出无法描述数据的平坦轨迹。然后，我们介绍了多重打靶法，并成功演示了将神经微分方程拟合到标准方法无法拟合的两个数据集（合成数据集和实验数据集）的方法。使用惩罚或增广拉格朗日方法可以满足多次射击引入的约束。
摘要：Neural differential equations have recently emerged as a flexible data-driven/hybrid approach to model time-series data. This work experimentally demonstrates that if the data contains oscillations, then standard fitting of a neural differential equation may give flattened out trajectory that fails to describe the data. We then introduce the multiple shooting method and present successful demonstrations of this method for the fitting of a neural differential equation to two datasets (synthetic and experimental) that the standard approach fails to fit. Constraints introduced by multiple shooting can be satisfied using a penalty or augmented Lagrangian method.

【4】 Benchmarking the Spectrum of Agent Capabilities
标题：对座席功能的范围进行基准测试
链接：https://arxiv.org/abs/2109.06780

作者：Danijar Hafner
机构：Google Research, Brain Team, University of Toronto
备注：Website: this https URL
摘要：评估智能代理的一般能力需要复杂的仿真环境。现有的基准测试通常只评估每个环境中的一个狭窄任务，要求研究人员在许多不同的环境中执行昂贵的训练运行。我们将介绍Crafter，这是一款开放世界生存游戏，具有视觉输入，可在单一环境中评估各种一般能力。代理人要么从提供的奖励信号中学习，要么通过内在目标学习，并通过语义上有意义的成就进行评估，这些成就可以在每一集中解锁，例如发现资源和制作工具。始终如一地释放所有成就需要强有力的概括、深入的探索和长期的推理。我们通过实验验证了Crafter在推动未来研究方面具有适当的难度，并提供了奖励代理和无监督代理的基线分数。此外，我们观察到从最大化奖励信号中出现的复杂行为，例如建造隧道系统、桥梁、房屋和种植园。我们希望Crafter能够通过快速评估各种能力来加速研究进展。
摘要：Evaluating the general abilities of intelligent agents requires complex simulation environments. Existing benchmarks typically evaluate only one narrow task per environment, requiring researchers to perform expensive training runs on many different environments. We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment. Agents either learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements that can be unlocked during each episode, such as discovering resources and crafting tools. Consistently unlocking all achievements requires strong generalization, deep exploration, and long-term reasoning. We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents. Furthermore, we observe sophisticated behaviors emerging from maximizing the reward signal, such as building tunnel systems, bridges, houses, and plantations. We hope that Crafter will accelerate research progress by quickly evaluating a wide spectrum of abilities.

【5】 HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO
标题：HPOBench：一组可重现的HPO多保真基准问题
链接：https://arxiv.org/abs/2109.06716

作者：Katharina Eggensperger,Philipp Müller,Neeratyoy Mallik,Matthias Feurer,René Sass,Aaron Klein,Noor Awad,Marius Lindauer,Frank Hutter
机构： Albert-Ludwigs-Universität Freiburg , Leibniz Universität Hannover, Amazon , Bosch Center for Artificial Intelligence
摘要：为了实现峰值预测性能，超参数优化（HPO）是机器学习及其应用的重要组成部分。在过去的几年里，高效的HPO算法和工具的数量大幅增长。与此同时，社区仍然缺乏现实的、多样化的、计算成本低廉的和标准化的基准。对于多保真度HPO方法尤其如此。为了缩小这一差距，我们提出了HPOBench，其中包括7个现有的和5个新的基准系列，总共有100多个多保真度基准问题。HPOBench允许以可复制的方式运行这组可扩展的多保真度HPO基准测试，方法是将各个基准测试隔离并封装在容器中。它还为计算上负担得起但在统计上可靠的评估提供了替代基准和表格基准。为了证明HPOBench的广泛兼容性及其有用性，我们进行了一项示范性大规模研究，评估了6种著名的多保真度HPO工具。
摘要：To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial component of machine learning and its applications. Over the last years,the number of efficient algorithms and tools for HPO grew substantially. At the same time, the community is still lacking realistic, diverse, computationally cheap,and standardized benchmarks. This is especially the case for multi-fidelity HPO methods. To close this gap, we propose HPOBench, which includes 7 existing and 5 new benchmark families, with in total more than 100 multi-fidelity benchmark problems. HPOBench allows to run this extendable set of multi-fidelity HPO benchmarks in a reproducible way by isolating and packaging the individual benchmarks in containers. It also provides surrogate and tabular benchmarks for computationally affordable yet statistically sound evaluations. To demonstrate the broad compatibility of HPOBench and its usefulness, we conduct an exemplary large-scale study evaluating 6 well known multi-fidelity HPO tools.

【6】 LRWR: Large-Scale Benchmark for Lip Reading in Russian language
标题：LRWR：大型俄语唇语阅读基准
链接：https://arxiv.org/abs/2109.06692

作者：Evgeniy Egorov,Vasily Kostyumov,Mikhail Konyk,Sergey Kolesnikov
机构：Moscow Institute of Physics and Technology, Tinkoff.AI
摘要：唇读，也称为视觉语音识别，旨在通过分析嘴唇及其附近区域的视觉变形来识别视频中的语音内容。该领域研究的一个重大障碍是缺乏适用于多种语言的适当数据集：到目前为止，这些方法只关注英语或汉语。在本文中，我们介绍了一个自然分布的大规模俄语唇读基准测试LRWR，它包含235个类和135个说话人。我们提供了数据集收集管道和数据集统计的详细描述。我们还对目前流行的LRWR唇读方法进行了综合比较，并对其性能进行了详细分析。结果显示了基准语言之间的差异，并为唇读模型的优化提供了几个有希望的方向。由于我们的发现，我们在LRW基准上也取得了最新的成果。
摘要：Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of proper datasets for a wide variety of languages: so far, these methods have been focused only on English or Chinese. In this paper, we introduce a naturally distributed large-scale benchmark for lipreading in Russian language, named LRWR, which contains 235 classes and 135 speakers. We provide a detailed description of the dataset collection pipeline and dataset statistics. We also present a comprehensive comparison of the current popular lipreading methods on LRWR and conduct a detailed analysis of their performance. The results demonstrate the differences between the benchmarked languages and provide several promising directions for lipreading models finetuning. Thanks to our findings, we also achieved new state-of-the-art results on the LRW benchmark.

【7】 Reactive and Safe Road User Simulations using Neural Barrier Certificates
标题：基于神经屏障证书的反应性安全道路用户模拟
链接：https://arxiv.org/abs/2109.06689

作者：Yue Meng,Zengyi Qin,Chuchu Fan
机构： Zengyi Qin and Chuchu Fan are with the Department ofAeronautics and Astronautics, Massachusetts Institute of Technology
备注：Accepted at IROS 2021
摘要：反应式和安全代理模型对于当今的交通模拟器设计和安全规划应用非常重要。在这项工作中，我们提出了一个反应式代理模型，通过仅从专家数据中学习高层决策和由联合学习的分散障碍证书引导的低级分散控制器，该模型可以确保安全，而不包含原始目的。实证结果表明，与最先进的模仿学习和基于纯控制的方法相比，我们学习的道路使用者仿真模型可以显著提高安全性，同时与人工智能体类似，对专家数据的误差更小。此外，我们学习的反应式代理可以更好地概括未知的交通状况，并更好地对其他道路用户作出反应，因此可以帮助实际理解具有挑战性的规划问题。
摘要：Reactive and safe agent modelings are important for nowadays traffic simulator designs and safe planning applications. In this work, we proposed a reactive agent model which can ensure safety without comprising the original purposes, by learning only high-level decisions from expert data and a low-level decentralized controller guided by the jointly learned decentralized barrier certificates. Empirical results show that our learned road user simulation models can achieve a significant improvement in safety comparing to state-of-the-art imitation learning and pure control-based methods, while being similar to human agents by having smaller errors to the expert data. Moreover, our learned reactive agents are shown to generalize better to unseen traffic conditions, and react better to other road users and therefore can help understand challenging planning problems pragmatically.

【8】 Scalable Font Reconstruction with Dual Latent Manifolds
标题：基于对偶潜在流形的可伸缩字体重建
链接：https://arxiv.org/abs/2109.06627

作者：Nikita Srivatsan,Si Wu,Jonathan T. Barron,Taylor Berg-Kirkpatrick
机构：Language Technologies Institute, Carnegie Mellon University, Khoury College of Computer Science, Northeastern University, Google Research, Computer Science and Engineering, University of California, San Diego
备注：EMNLP 2021
摘要：我们提出了一个深层生成模型，通过学习字体样式和字符形状的分离流形来进行排版分析和字体重建。与以前的方法相比，我们的方法使我们能够大规模地增加可以有效建模的角色类型的数量。具体来说，我们通过一对推理网络来推断表示字符和字体的独立潜在变量，这些网络将所有字符类型共享或属于同一字体的字形作为输入集。这种设计使我们的模型能够推广到训练期间未观察到的字符，这是考虑到大多数字体的相对稀疏性的一项重要任务。我们还提出了一种新的损失，它是根据以前的工作改编的，该工作使用投影空间中的自适应分布来测量可能性，从而在不需要鉴别器的情况下生成更自然的图像。我们对代表多种语言字符类型的各种数据集的字体重建任务进行了评估，并根据自动和手动评估的指标与现代风格的传输系统进行了比较。
摘要：We propose a deep generative model that performs typography analysis and font reconstruction by learning disentangled manifolds of both font style and character shape. Our approach enables us to massively scale up the number of character types we can effectively model compared to previous methods. Specifically, we infer separate latent variables representing character and font via a pair of inference networks which take as input sets of glyphs that either all share a character type, or belong to the same font. This design allows our model to generalize to characters that were not observed during training time, an important task in light of the relative sparsity of most fonts. We also put forward a new loss, adapted from prior work that measures likelihood using an adaptive distribution in a projected space, resulting in more natural images without requiring a discriminator. We evaluate on the task of font reconstruction over various datasets representing character types of many languages, and compare favorably to modern style transfer systems according to both automatic and manually-evaluated metrics.

【9】 Deep hierarchical reinforcement agents for automated penetration testing
标题：用于自动渗透测试的深层次增强剂
链接：https://arxiv.org/abs/2109.06449

作者：Khuong Tran,Ashlesha Akella,Maxwell Standen,Junae Kim,David Bowman,Toby Richer,Chin-Teng Lin
机构：University of Technology Sydney, Australia, Defence Science and Technology Group, Australia
备注：Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476)
摘要：渗透测试为了测试现有防御系统而对计算机系统进行的有组织的攻击已被广泛用于评估网络安全。这是一个耗时的过程，需要深入了解以制定类似于真实网络攻击的战略。本文提出了一种新的深度强化学习体系结构，称为HA-DRL，它采用代数动作分解策略来处理自主渗透测试模拟器的大型离散动作空间，其中动作数量随着所设计网络安全网络的复杂性呈指数增长。该体系结构比传统的深度Q学习代理更快、更稳定地找到最优攻击策略。深度Q学习代理是将人工智能应用于自动渗透测试的一种常用方法。
摘要：Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents called HA-DRL, which employs an algebraic action decomposition strategy to address the large discrete action space of an autonomous penetration testing simulator where the number of actions is exponentially increased with the complexity of the designed cybersecurity network. The proposed architecture is shown to find the optimal attacking policy faster and more stably than a conventional deep Q-learning agent which is commonly used as a method to apply artificial intelligence in automatic penetration testing.

【10】 Exploring the Long Short-Term Dependencies to Infer Shot Influence in Badminton Matches
标题：羽毛球比赛中对投篮影响的长、短期依赖关系探讨
链接：https://arxiv.org/abs/2109.06431

作者：Wei-Yao Wang,Teng-Fong Chan,Hui-Kuo Yang,Chih-Chuan Wang,Yao-Chung Fan,Wen-Chih Peng
机构：National Yang Ming Chiao Tung University, Hsinchu, Taiwan, National Chung Hsing University, Taichung, Taiwan
备注：6 pages, accepted by ICDM 2021
摘要：在羽毛球比赛中，在拉力赛中确定重要的投篮对于评估球员的表现非常重要。虽然有一些研究量化了运动员在其他体育项目中的表现，但分析羽毛球数据仍然没有涉及。在本文中，我们引入了一种羽毛球语言来全面描述击球过程，并提出了一个深度学习模型，该模型由一个新的短期抽取器和一个长期编码器组成，用于在羽毛球拉力赛中通过将问题框定为预测拉力赛结果来捕获一个逐球序列。我们的模型结合了注意力机制，使动作序列对拉力赛结果具有透明度，这对于羽毛球专家获得可解释的预测至关重要。基于真实数据集的实验评估表明，我们提出的模型优于强基线。源代码可在https://github.com/yao0510/Shot-Influence.
摘要：Identifying significant shots in a rally is important for evaluating players' performance in badminton matches. While there are several studies that have quantified player performance in other sports, analyzing badminton data is remained untouched. In this paper, we introduce a badminton language to fully describe the process of the shot and propose a deep learning model composed of a novel short-term extractor and a long-term encoder for capturing a shot-by-shot sequence in a badminton rally by framing the problem as predicting a rally result. Our model incorporates an attention mechanism to enable the transparency of the action sequence to the rally result, which is essential for badminton experts to gain interpretable predictions. Experimental evaluation based on a real-world dataset demonstrates that our proposed model outperforms the strong baselines. The source code is publicly available at https://github.com/yao0510/Shot-Influence.

【11】 Exploring Personality and Online Social Engagement: An Investigation of MBTI Users on Twitter
标题：探索个性与在线社会参与：对推特上MBTI用户的调查
链接：https://arxiv.org/abs/2109.06402

作者：Partha Kadambi
摘要：基于文本的计算模型人格预测是一个新兴领域，有可能显著改善基于调查的人格评估的关键弱点。我们调查了3848份来自Twitter的个人资料，其中有自我标记的Myers-Briggs个性特征（MBTI）——一个与五因素人格模型密切相关的框架——以更好地理解如何使用在线社交活动中基于文本的数字跟踪来预测用户的个性特征。我们利用BERT（一种基于深度学习的最先进的NLP体系结构）来分析对我们的任务具有最大预测能力的各种文本源。我们发现，传记、状态和喜欢的推文对MBTI系统的所有维度都具有重要的预测能力。我们讨论了我们的发现及其对MBTI和词汇假设有效性的影响。词汇假设是连接语言使用和行为的五因素模型的基础理论。我们的研究结果对人格心理学家、计算语言学家和其他旨在通过观察文本数据预测人格并探索语言与核心行为特征之间联系的社会科学家具有乐观的意义。
摘要：Text-based personality prediction by computational models is an emerging field with the potential to significantly improve on key weaknesses of survey-based personality assessment. We investigate 3848 profiles from Twitter with self-labeled Myers-Briggs personality traits (MBTI) - a framework closely related to the Five Factor Model of personality - to better understand how text-based digital traces from social engagement online can be used to predict user personality traits. We leverage BERT, a state-of-the-art NLP architecture based on deep learning, to analyze various sources of text that hold most predictive power for our task. We find that biographies, statuses, and liked tweets contain significant predictive power for all dimensions of the MBTI system. We discuss our findings and their implications for the validity of the MBTI and the lexical hypothesis, a foundational theory underlying the Five Factor Model that links language use and behavior. Our results hold optimistic implications for personality psychologists, computational linguists, and other social scientists aiming to predict personality from observational text data and explore the links between language and core behavioral traits.

【12】 Rationales for Sequential Predictions
标题：序贯预测的基本原理
链接：https://arxiv.org/abs/2109.06387

作者：Keyon Vafa,Yuntian Deng,David M. Blei,Alexander M. Rush
机构：Columbia University, Harvard University, Cornell Tech
备注：To appear in the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)
摘要：序列模型是现代自然语言处理系统的重要组成部分，但它们的预测很难解释。我们认为模型解释，虽然理由，子集的上下文，可以解释个人模型预测。我们通过求解一个组合优化来找到序列的基本原理：最好的基本原理是预测与完整序列相同输出的最小输入标记子集。枚举所有子集是很困难的，因此我们提出了一种有效的贪婪算法来逼近这个目标。该算法称为贪婪合理化，适用于任何模型。为了使这种方法有效，在对上下文的不完整子集进行预测时，模型应形成兼容的条件分布。可通过短时间微调步骤强制执行此条件。我们研究语言建模和机器翻译的贪婪合理化。与现有基线相比，贪婪合理化最能优化组合目标，并提供最可靠的理论依据。在一个新的带注释的顺序推理数据集上，贪婪推理与人类推理最为相似。
摘要：Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We find sequential rationales by solving a combinatorial optimization: the best rationale is the smallest subset of input tokens that would predict the same output as the full sequence. Enumerating all subsets is intractable, so we propose an efficient greedy algorithm to approximate this objective. The algorithm, which is called greedy rationalization, applies to any model. For this approach to be effective, the model should form compatible conditional distributions when making predictions on incomplete subsets of the context. This condition can be enforced with a short fine-tuning step. We study greedy rationalization on language modeling and machine translation. Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales. On a new dataset of annotated sequential rationales, greedy rationales are most similar to human rationales.

【13】 ML Based Lineage in Databases
标题：数据库中基于ML的谱系
链接：https://arxiv.org/abs/2109.06339

作者：Michael Leybovich,Oded Shmueli
机构：Technion, Haifa, Israel
摘要：在这项工作中，我们跟踪元组在其数据库生命周期中的沿袭。也就是说，我们考虑一个场景，其中由查询产生的元组（记录）可能会影响到其他的元组插入到DB中，作为正常工作流的一部分。随着时间的推移，这些元组的确切起源解释变得嵌套得越来越深，越来越占用空间，导致清晰度和可读性降低。我们提出了一种新的近似血统跟踪方法，使用机器学习（ML）和自然语言处理（NLP）技术；即，单词嵌入。其基本思想是通过一小组恒定大小的向量（每个元组的向量数是一个超参数）总结（并近似）每个元组的谱系。因此，我们的解决方案不受空间复杂性随时间膨胀的影响，它“自然地”对元组存在的解释进行排序。我们设计了一种替代和改进的沿袭跟踪机制，即在列级别跟踪和查询沿袭；因此，我们能够更好地区分元组的来源特征和文本特征。我们通过扩展（ProvSQL）将沿袭计算集成到PostgreSQL系统中，并在实验上展示了精确的、基于半环的证明的有用结果。在实验中，我们重点研究了具有多代元组的元组，并从直系和远系的角度对其进行了分析。实验表明，提出的近似谱系方法和进一步建议的增强方法具有很高的实用潜力。这尤其适用于基于列的向量方法，该方法具有高精度和高每级召回率。
摘要：In this work, we track the lineage of tuples throughout their database lifetime. That is, we consider a scenario in which tuples (records) that are produced by a query may affect other tuple insertions into the DB, as part of a normal workflow. As time goes on, exact provenance explanations for such tuples become deeply nested, increasingly consuming space, and resulting in decreased clarity and readability. We present a novel approach for approximating lineage tracking, using a Machine Learning (ML) and Natural Language Processing (NLP) technique; namely, word embedding. The basic idea is summarizing (and approximating) the lineage of each tuple via a small set of constant-size vectors (the number of vectors per-tuple is a hyperparameter). Therefore, our solution does not suffer from space complexity blow-up over time, and it "naturally ranks" explanations to the existence of a tuple. We devise an alternative and improved lineage tracking mechanism, that of keeping track of and querying lineage at the column level; thereby, we manage to better distinguish between the provenance features and the textual characteristics of a tuple. We integrate our lineage computations into the PostgreSQL system via an extension (ProvSQL) and experimentally exhibit useful results in terms of accuracy against exact, semiring-based, justifications. In the experiments, we focus on tuples with multiple generations of tuples in their lifelong lineage and analyze them in terms of direct and distant lineage. The experiments suggest a high usefulness potential for the proposed approximate lineage methods and the further suggested enhancements. This especially holds for the column-based vectors method which exhibits high precision and high per-level recall.

【14】 State Relevance for Off-Policy Evaluation
标题：非政策评估的状态相关性
链接：https://arxiv.org/abs/2109.06310

作者：Simon P. Shen,Yecheng Jason Ma,Omer Gottesman,Finale Doshi-Velez
机构： Resolving this issue is 1Harvard University, MA 2University of Pennsyl-vania, PA 3Brown University
备注：None
摘要：非政策性评估（OPE）的基于重要性抽样的估计器因其简单、无偏和依赖相对较少的假设而受到重视。然而，这些估计值的方差通常很高，特别是当轨迹长度不同时。在这项工作中，我们引入了与返回重要性抽样无关的省略状态（OSIRIS），这是一种通过策略性地省略与某些状态相关的似然比来减少方差的估计量。我们形式化了OSIRIS无偏且方差低于普通重要性抽样的条件，并通过经验证明了这些性质。
摘要：Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.

【15】 Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation
标题：神经机器翻译中弹性权值合并减轻预定抽样中的灾难性遗忘
链接：https://arxiv.org/abs/2109.06308

作者：Michalis Korakakis,Andreas Vlachos
机构：Department of Computer Science, University of Cambridge
摘要：尽管在许多序列到序列任务中表现出色，但使用最大似然估计训练的自回归模型仍存在暴露偏差，即训练期间使用的基本真理前缀与推理时使用的模型生成前缀之间存在差异。计划采样是一种简单且通常在经验上成功的方法，它通过将模型生成的前缀合并到训练过程中来解决此问题。然而，有人认为这是一个不一致的训练目标，导致模型完全忽略前缀。在本文中，我们进行了系统的实验，发现它通过增加模型对输入序列的依赖来改善曝光偏差。我们还观察到，作为一种副作用，当模型生成的前缀正确时，它会恶化性能，这是一种灾难性遗忘。我们建议使用弹性权重合并作为减少暴露偏差和保持输出质量之间的权衡。在两个IWSLT'14翻译任务上的实验表明，与标准计划抽样相比，我们的方法减轻了灾难性遗忘，显著提高了BLEU。
摘要：Despite strong performance in many sequence-to-sequence tasks, autoregressive models trained with maximum likelihood estimation suffer from exposure bias, i.e. a discrepancy between the ground-truth prefixes used during training and the model-generated prefixes used at inference time. Scheduled sampling is a simple and often empirically successful approach which addresses this issue by incorporating model-generated prefixes into the training process. However, it has been argued that it is an inconsistent training objective leading to models ignoring the prefixes altogether. In this paper, we conduct systematic experiments and find that it ameliorates exposure bias by increasing model reliance on the input sequence. We also observe that as a side-effect, it worsens performance when the model-generated prefix is correct, a form of catastrophic forgetting. We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality. Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU compared to standard scheduled sampling.

【16】 Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation
标题：多句重采样：一种缓解数据集长度偏差和波束搜索退化的简单方法
链接：https://arxiv.org/abs/2109.06253

作者：Ivan Provilkov,Andrey Malinin
机构：Yandex Research, Moscow, Moscow Institute of Physics and Technology, HSE University, Moscow
摘要：神经机器翻译（NMT）是已知的受到梁搜索问题：在一定的点之后，增加的光束尺寸导致翻译质量的整体下降。这种效果在长句中尤其明显。虽然已经做了大量工作来分析这一现象，主要是针对自回归NMT模型，但对于其根本原因仍然没有达成共识。在这项工作中，我们分析了在NMT和自动语音识别（ASR）中导致大波束质量下降的错误。我们表明，一个对大光束质量下降有很大贡献的因素是\textit{dataset length bias}-\textit{NMT dataset强烈偏向短句}。为了缓解这个问题，我们提出了一种新的数据增强技术--\textit{多句重采样（MSR）}。该技术通过将原始数据集中的几个句子连接起来，生成一个长的训练示例，从而扩展了训练示例。我们证明，在IWSTL$15$En Vi、IWSTL$17$En Fr和WMT$14$En De数据集上，MSR显著降低了光束尺寸增大时的退化，并提高了最终翻译质量。
摘要：Neural Machine Translation (NMT) is known to suffer from a beam-search problem: after a certain point, increasing beam size causes an overall drop in translation quality. This effect is especially pronounced for long sentences. While much work was done analyzing this phenomenon, primarily for autoregressive NMT models, there is still no consensus on its underlying cause. In this work, we analyze errors that cause major quality degradation with large beams in NMT and Automatic Speech Recognition (ASR). We show that a factor that strongly contributes to the quality degradation with large beams is \textit{dataset length-bias} - \textit{NMT datasets are strongly biased towards short sentences}. To mitigate this issue, we propose a new data augmentation technique -- \textit{Multi-Sentence Resampling (MSR)}. This technique extends the training examples by concatenating several sentences from the original dataset to make a long training example. We demonstrate that MSR significantly reduces degradation with growing beam size and improves final translation quality on the IWSTL$15$ En-Vi, IWSTL$17$ En-Fr, and WMT$14$ En-De datasets.

【17】 Physics Driven Domain Specific Transporter Framework with Attention Mechanism for Ultrasound Imaging
标题：具有注意机制的物理驱动的超声成像领域特定传输器框架
链接：https://arxiv.org/abs/2109.06346

作者：Arpan Tripathi,Abhilash Rakkunedeth,Mahesh Raveendranatha Panicker,Jack Zhang,Naveenjyote Boora,Jessica Knight,Jacob Jaremko,Yale Tung Chen,Kiran Vishnu Narayan,Kesavadas C
机构： Universityof Alberta, Yale Tung Chen is with Hospital Universitario Puerta deHierro Spain
备注：11 pages,18 figures(including supplementary material)
摘要：深度学习技术在医学成像中的大多数应用都是在监督下进行的，需要大量的标记数据，这些数据非常昂贵，并且需要专家花很多时间仔细注释。在本文中，我们提出了一个无监督、物理驱动的领域特异性转运体框架，该框架具有注意机制，用于识别超声成像中的相关关键点。建议的框架确定了关键点，提供了一个简洁的几何表示，突出显示超声视频中具有高度结构变化的区域。我们将物理驱动的领域特定信息合并为特征概率图，并使用radon变换突出特定方向的特征。该框架已在130个肺部超声（LUS）视频和113个腕部超声（WUS）视频上进行了训练，并在全球多个中心采集的100个肺部超声（LUS）视频和58个腕部超声（WUS）视频上进行了验证。专家对两组数据集的图像进行独立评估，以确定临床相关特征，如WUS视频中的A线、B线和胸膜，以及桡骨干骺端、桡骨骺端和腕骨。从两个数据集中检测到的关键点在检测专家识别的图像地标时显示出高灵敏度（LUS=99\%，WUS=74\%）。此外，在将给定的肺图像分类为正常和异常类别时，即使没有事先的训练，该方法在3倍交叉验证的共同分类任务中的平均准确率为97%，平均F1分数为95%。由于所提出方法的纯粹无监督性质，我们期望关键点检测方法能够提高超声波在急诊和护理点进行的各种检查中的适用性。
摘要：Most applications of deep learning techniques in medical imaging are supervised and require a large number of labeled data which is expensive and requires many hours of careful annotation by experts. In this paper, we propose an unsupervised, physics driven domain specific transporter framework with an attention mechanism to identify relevant key points with applications in ultrasound imaging. The proposed framework identifies key points that provide a concise geometric representation highlighting regions with high structural variation in ultrasound videos. We incorporate physics driven domain specific information as a feature probability map and use the radon transform to highlight features in specific orientations. The proposed framework has been trained on130 Lung ultrasound (LUS) videos and 113 Wrist ultrasound (WUS) videos and validated on 100 Lung ultrasound (LUS) videos and 58 Wrist ultrasound (WUS) videos acquired from multiple centers across the globe. Images from both datasets were independently assessed by experts to identify clinically relevant features such as A-lines, B-lines and pleura from LUS and radial metaphysis, radial epiphysis and carpal bones from WUS videos. The key points detected from both datasets showed high sensitivity (LUS = 99\% , WUS = 74\%) in detecting the image landmarks identified by experts. Also, on employing for classification of the given lung image into normal and abnormal classes, the proposed approach, even with no prior training, achieved an average accuracy of 97\% and an average F1-score of 95\% respectively on the task of co-classification with 3 fold cross-validation. With the purely unsupervised nature of the proposed approach, we expect the key point detection approach to increase the applicability of ultrasound in various examination performed in emergency and point of care.

【18】 In-filter Computing For Designing Ultra-light Acoustic Pattern Recognizers
标题：用于设计超光声模式识别器的过滤内计算
链接：https://arxiv.org/abs/2109.06171

作者：Abhishek Ramdas Nair,Shantanu Chakrabartty,Chetan Singh Thakur
机构： Washington University in St
备注：in IEEE Internet of Things Journal
摘要：我们提出了一种新的滤波器内计算框架，可用于设计用于智能物联网（IoTs）的超光声分类器。与传统的声学模式识别器不同，该识别器的特征提取和分类是独立设计的，该结构将卷积和非线性滤波操作直接集成到支持向量机（SVM）的核中。这种集成的结果是基于模板的SVM，其内存和计算足迹（训练和推理）足够轻，可以在基于FPGA的物联网平台上实现。虽然提出的滤波器内计算框架具有足够的通用性，但在本文中，我们使用一种基于CAR-IHC的声学特征提取算法来证明这一概念。针对Xilinx Spartan 7系列现场可编程门阵列（FPGA），使用时间复用和并行流水线技术对整个系统进行了优化。我们表明，该系统仅使用~1.5k查找表（LUT）和~2.8k触发器（FFs），就可以在基准声音识别任务上实现鲁棒分类性能，这与其他方法相比有显著改进。
摘要：We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of a Support Vector Machine (SVM). The result of this integration is a template-based SVM whose memory and computational footprint (training and inference) is light enough to be implemented on an FPGA-based IoT platform. While the proposed in-filter computing framework is general enough, in this paper, we demonstrate this concept using a Cascade of Asymmetric Resonator with Inner Hair Cells (CAR-IHC) based acoustic feature extraction algorithm. The complete system has been optimized using time-multiplexing and parallel-pipeline techniques for a Xilinx Spartan 7 series Field Programmable Gate Array (FPGA). We show that the system can achieve robust classification performance on benchmark sound recognition tasks using only ~ 1.5k Look-Up Tables (LUTs) and ~ 2.8k Flip-Flops (FFs), a significant improvement over other approaches.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递