机器学习学术速递[9.16]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计75篇

Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】 Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition
标题：基于层次图融合的多模态情感识别
链接：https://arxiv.org/abs/2109.07149

作者：Shuyun Tang,Zhaojie Luo,Guoshun Nan,Yuichiro Yoshikawa,Ishiguro Hiroshi
机构： UC Berkeley , Osaka University , Singapore University of Technology and Design
摘要：基于丰富的多模态输入（包括文本、语音和视觉线索）的自动情感识别（AER）对于情感智能机器的发展至关重要。尽管复杂的情态关系已被证明对AER有效，但由于之前的研究主要依赖于各种融合机制，并通过简单的连接特征来学习情感分类的多模态表示，因此对复杂情态关系的研究仍然很少。本文提出了一种新的分层融合图卷积网络（HFGCN）模型，该模型通过在特征融合过程中考虑模态相关性来学习更多信息的多模态表示。具体来说，该模型使用两阶段图构造方法融合多模态输入，并将模态依赖性编码到会话表示中。我们通过将情绪状态投影到二维价唤醒（VA）子空间，验证了该方法的可解释性。大量的实验表明，我们提出的模型对于更精确的AER是有效的，它在两个公共数据集IEMOCAP和MELD上产生了最先进的结果。
摘要：Automatic emotion recognition (AER) based on enriched multimodal inputs, including text, speech, and visual clues, is crucial in the development of emotionally intelligent machines. Although complex modality relationships have been proven effective for AER, they are still largely underexplored because previous works predominantly relied on various fusion mechanisms with simply concatenated features to learn multimodal representations for emotion classification. This paper proposes a novel hierarchical fusion graph convolutional network (HFGCN) model that learns more informative multimodal representations by considering the modality dependencies during the feature fusion procedure. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. We verified the interpretable capabilities of the proposed method by projecting the emotional states to a 2D valence-arousal (VA) subspace. Extensive experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets, IEMOCAP and MELD.

【2】 Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization
标题：基于扩散小波节点特征分布特征的图嵌入
链接：https://arxiv.org/abs/2109.07016

作者：Lili Wang,Chenghan Huang,Weicheng Ma,Xinyuan Cao,Soroush Vosoughi
机构：Dartmouth College, Hanover, New Hampshire, USA, Millennium Management, LLC, New York, New York, USA, Georgia Institute of Technology, Atlanta, Georgia, USA
备注：In CIKM 2021
摘要：近年来，图形数据的代表性学习方法发展迅速。然而，这些方法中的大多数侧重于不同尺度上的节点级表示学习（例如，微观、介观和宏观节点嵌入）。相比之下，目前对整个图进行表示学习的方法相对较少。本文提出了一种新的无监督全图嵌入方法。我们的方法使用谱图小波捕捉节点之间每个k-hop子图的拓扑相似性，并使用它们学习整个图的嵌入。我们在4个真实数据集上对12条已知基线对我们的方法进行了评估，结果表明，我们的方法在所有实验中都达到了最佳性能，大大超过了当前最先进的方法。
摘要：Recent years have seen a rise in the development of representational learning methods for graph data. Most of these methods, however, focus on node-level representation learning at various scales (e.g., microscopic, mesoscopic, and macroscopic node embedding). In comparison, methods for representation learning on whole graphs are currently relatively sparse. In this paper, we propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph. We evaluate our method against 12 well-known baselines on 4 real-world datasets and show that our method achieves the best performance across all experiments, outperforming the current state-of-the-art by a considerable margin.

【3】 HeMI: Multi-view Embedding in Heterogeneous Graphs
标题：HEMI：异构图中的多视图嵌入
链接：https://arxiv.org/abs/2109.07008

作者：Costas Mavromatis,George Karypis
摘要：许多真实世界的图涉及不同类型的节点和节点之间的关系，本质上是异构的。异构图的表示学习（HGs）将这些图的丰富结构和语义嵌入到低维空间中，并促进各种数据挖掘任务，如节点分类、节点聚类和链接预测。在本文中，我们提出了一种自监督的方法，该方法通过不同HG结构语义（元路径）之间的知识交换和发现来学习HG表示。具体而言，通过最大化元路径表示的互信息，我们促进了元路径信息融合和一致性，并确保对全局共享语义进行编码。通过对节点分类、节点聚类和链路预测任务的大量实验，我们表明，对于所有任务，所提出的自监督方法的性能都优于并提高了竞争方法1%和10%。
摘要：Many real-world graphs involve different types of nodes and relations between nodes, being heterogeneous by nature. The representation learning of heterogeneous graphs (HGs) embeds the rich structure and semantics of such graphs into a low-dimensional space and facilitates various data mining tasks, such as node classification, node clustering, and link prediction. In this paper, we propose a self-supervised method that learns HG representations by relying on knowledge exchange and discovery among different HG structural semantics (meta-paths). Specifically, by maximizing the mutual information of meta-path representations, we promote meta-path information fusion and consensus, and ensure that globally shared semantics are encoded. By extensive experiments on node classification, node clustering, and link prediction tasks, we show that the proposed self-supervision both outperforms and improves competing methods by 1% and up to 10% for all tasks.

Transformer(2篇)

【1】 Matching with Transformers in MELT
标题：与熔融Transformer的匹配
链接：https://arxiv.org/abs/2109.07401

作者：Sven Hertling,Jan Portisch,Heiko Paulheim
机构： Data and Web Science Group, University of Mannheim, Germany, SAP SE Business Technology Platform - One Domain Model, Walldorf, Germany
备注：accepted at the Ontology Matching Workshop at the International Semantic Web Conference (ISWC 2021)
摘要：本体和知识图的自动匹配的最强信号之一是概念的文本描述。通常采用的方法（如基于字符或标记的比较）相对简单，因此无法捕捉文本的实际含义。随着基于转换器的语言模型的兴起，基于意义（而非词汇特征）的文本比较成为可能。在本文中，我们将本体匹配任务建模为分类问题，并提出了基于Transformer模型的方法。我们进一步在MELT框架中提供了一个易于使用的实现，该框架适合于本体和知识图匹配。我们表明，基于转换器的滤波器有助于在高召回率对齐的情况下选择正确的对应，并且通过简单的对齐后处理方法已经取得了良好的效果。
摘要：One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is possible. In this paper, we model the ontology matching task as classification problem and present approaches based on transformer models. We further provide an easy to use implementation in the MELT framework which is suited for ontology and knowledge graph matching. We show that a transformer-based filter helps to choose the correct correspondences given a high-recall alignment and already achieves a good result with simple alignment post-processing methods.

【2】 Explainable Identification of Dementia from Transcripts using Transformer Networks
标题：利用Transformer网络从成绩单中解释痴呆的识别
链接：https://arxiv.org/abs/2109.06980

作者：Loukas Ilias,Dimitris Askounis
机构：School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens
备注：This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
摘要：阿尔茨海默病（AD）是痴呆症的主要病因，痴呆症伴随着记忆丧失，如果不能及时诊断，可能导致人们日常生活中的严重后果。很少有工作利用基于Transformer的网络，尽管实现了高精度，但在模型可解释性方面所做的工作很少。此外，尽管简易精神状态检查（MMSE）分数与痴呆症的识别有着密不可分的联系，但研究工作将痴呆症识别任务和MMSE分数预测任务作为两个独立的任务。为了解决这些限制，我们采用了几种基于Transformer的模型，其中BERT实现了85.56%的最高精度。同时，我们提出了一种基于暹罗网络的可解释的AD患者检测方法，准确率高达81.18%。接下来，我们介绍了两个多任务学习模型，其中主要任务是识别痴呆症（二元分类），而辅助任务是识别痴呆症的严重程度（多类分类）。我们的模型在多任务学习环境下检测AD患者的准确率为84.99%。最后，我们提出了一些新的方法来识别AD患者和非AD患者使用的语言模式，包括文本统计、词汇独特性、词汇使用、通过详细的语言分析的相关性以及解释性技术（LIME）。研究结果表明，AD患者和非AD患者在语言方面存在显著差异。
摘要：Alzheimer's disease (AD) is the main cause of dementia which is accompanied by loss of memory and may lead to severe consequences in peoples' everyday life if not diagnosed on time. Very few works have exploited transformer-based networks and despite the high accuracy achieved, little work has been done in terms of model interpretability. In addition, although Mini-Mental State Exam (MMSE) scores are inextricably linked with the identification of dementia, research works face the task of dementia identification and the task of the prediction of MMSE scores as two separate tasks. In order to address these limitations, we employ several transformer-based models, with BERT achieving the highest accuracy accounting for 85.56%. Concurrently, we propose an interpretable method to detect AD patients based on siamese networks reaching accuracy up to 81.18%. Next, we introduce two multi-task learning models, where the main task refers to the identification of dementia (binary classification), while the auxiliary one corresponds to the identification of the severity of dementia (multiclass classification). Our model obtains accuracy equal to 84.99% on the detection of AD patients in the multi-task learning setting. Finally, we present some new methods to identify the linguistic patterns used by AD patients and non-AD ones, including text statistics, vocabulary uniqueness, word usage, correlations via a detailed linguistic analysis, and explainability techniques (LIME). Findings indicate significant differences in language between AD and non-AD patients.

GAN|对抗|攻击|生成相关(4篇)

【1】 NBcoded: network attack classifiers based on Encoder and Naive Bayes model for resource limited devices
标题：NBCoded：基于编码器和朴素贝叶斯模型的资源受限设备网络攻击分类器
链接：https://arxiv.org/abs/2109.07273

作者：Lander Segurola-Gil,Francesco Zola,Xabier Echeberria-Barrio,Raul Orduna-Urrutia
机构：Orduna-Urrutia, Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Institute of Smart Cities, Public University of Navarre, Pamplona, Spain
备注：It will be published in "Communications in Computer and Information Science" and presented in the 3rd Workshop of Machine Learning for Cybersecurity (MLCS)
摘要：近年来，网络安全已获得高度的相关性，将检测攻击或入侵转化为一项关键任务。事实上，系统、应用程序或网络中的一个小漏洞可能会对公司造成巨大损害。然而，当这种攻击检测遇到人工智能范式时，可以使用高质量的分类器来解决，这些分类器通常在计算或内存使用方面需要较高的资源需求。当攻击分类器需要与资源有限的设备一起使用或在不使设备性能过载的情况下使用时，这种情况会产生很大的影响，例如在物联网设备或工业系统中。为了克服这个问题，本文提出了一种新的光攻击分类工具NBcoded。NBcoded在管道中工作，将编码器的噪声数据特性与朴素贝叶斯分类器获得的低资源和时间消耗相结合。这项工作比较了基于三种不同朴素贝叶斯似然分布假设（高斯、补码和伯努利）的三种不同NBCODE实现。然后，将最佳NBCODE与最先进的分类器（如多层感知器和随机森林）进行比较。我们的实现证明是减少训练时间和磁盘使用影响的最佳模型，即使它在准确性和F1分数（约2%）方面优于其他两个。
摘要：In the recent years, cybersecurity has gained high relevance, converting the detection of attacks or intrusions into a key task. In fact, a small breach in a system, application, or network, can cause huge damage for the companies. However, when this attack detection encounters the Artificial Intelligence paradigm, it can be addressed using high-quality classifiers which often need high resource demands in terms of computation or memory usage. This situation has a high impact when the attack classifiers need to be used with limited resourced devices or without overloading the performance of the devices, as it happens for example in IoT devices, or in industrial systems. For overcoming this issue, NBcoded, a novel light attack classification tool is proposed in this work. NBcoded works in a pipeline combining the removal of noisy data properties of the encoders with the low resources and timing consuming obtained by the Naive Bayes classifier. This work compares three different NBcoded implementations based on three different Naive Bayes likelihood distribution assumptions (Gaussian, Complement and Bernoulli). Then, the best NBcoded is compared with state of the art classifiers like Multilayer Perceptron and Random Forest. Our implementation shows to be the best model reducing the impact of training time and disk usage, even if it is outperformed by the other two in terms of Accuracy and F1-score (~ 2%).

【2】 Adversarial Mixing Policy for Relaxing Locally Linear Constraints in Mixup
标题：局部松弛混合中线性约束的对抗性混合策略
链接：https://arxiv.org/abs/2109.07177

作者：Guang Liu,Yuzhao Mao,Hailong Huang,Weiguo Gao,Xuan Li
机构：PingAn Life Insurance of China
备注：This paper is accepted to appear in the main conference of EMNLP2021
摘要：Mixup是当前深度分类网络的一种最新正则化方法。通过对实例对及其标签的凸组合训练神经网络，对模型的输入空间施加局部线性约束。然而，这种严格的线性约束通常会导致欠拟合，从而降低正则化的效果。值得注意的是，当资源极其有限时，这个问题变得更加严重。为了解决这些问题，我们提出了对抗性混合策略（AMP），以最小-最大兰德公式组织，以放松混合中的局部线性约束。具体而言，AMP在混合系数中添加了一个小的对抗性扰动，而不是示例。因此，在合成实例和合成标签之间注入轻微非线性。通过对这些数据进行训练，深度网络进一步正则化，从而实现较低的预测错误率。在五个文本分类基准和五个主干模型上的实验表明，我们的方法在很大程度上降低了混合变量的错误率（高达31.3%），尤其是在低资源条件下（高达17.5%）。
摘要：Mixup is a recent regularizer for current deep classification networks. Through training a neural network on convex combinations of pairs of examples and their labels, it imposes locally linear constraints on the model's input space. However, such strict linear constraints often lead to under-fitting which degrades the effects of regularization. Noticeably, this issue is getting more serious when the resource is extremely limited. To address these issues, we propose the Adversarial Mixing Policy (AMP), organized in a min-max-rand formulation, to relax the Locally Linear Constraints in Mixup. Specifically, AMP adds a small adversarial perturbation to the mixing coefficients rather than the examples. Thus, slight non-linearity is injected in-between the synthetic examples and synthetic labels. By training on these data, the deep networks are further regularized, and thus achieve a lower predictive error rate. Experiments on five text classification benchmarks and five backbone models have empirically shown that our methods reduce the error rate over Mixup variants in a significant margin (up to 31.3%), especially in low-resource conditions (up to 17.5%).

【3】 Balancing detectability and performance of attacks on the control channel of Markov Decision Processes
标题：平衡马尔可夫决策过程控制信道上攻击的可检测性和性能
链接：https://arxiv.org/abs/2109.07171

作者：Alessio Russo,Alexandre Proutiere
机构：Division of Decision and Control Systems, EECS School, KTH Royal Institute of Technology, Stockholm
摘要：研究了马尔可夫决策过程（MDP）控制信道上的最优隐形中毒攻击设计问题。本研究的动机是研究界最近对应用于MDP的对抗性和中毒性攻击以及强化学习（RL）方法的兴趣。这些方法产生的策略已被证明容易受到干扰决策者观察的攻击。在这种攻击中，受监督学习中使用的对抗性示例的启发，对抗性扰动的幅度根据某种规范进行限制，希望这种限制将使攻击变得不易察觉。然而，这样的约束不允许任何级别的不可检测性，也不考虑潜在马尔可夫过程的动态性质。在本文中，我们提出了一种新的基于信息理论量的攻击公式，该公式考虑了最小化攻击可检测性以及受控进程性能的目标。我们分析了攻击效率和可检测性之间的权衡。最后，我们用例子和数值模拟来说明这种权衡。
摘要：We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs). This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods. The policies resulting from these methods have been shown to be vulnerable to attacks perturbing the observations of the decision-maker. In such an attack, drawing inspiration from adversarial examples used in supervised learning, the amplitude of the adversarial perturbation is limited according to some norm, with the hope that this constraint will make the attack imperceptible. However, such constraints do not grant any level of undetectability and do not take into account the dynamic nature of the underlying Markov process. In this paper, we propose a new attack formulation, based on information-theoretical quantities, that considers the objective of minimizing the detectability of the attack as well as the performance of the controlled process. We analyze the trade-off between the efficiency of the attack and its detectability. We conclude with examples and numerical simulations illustrating this trade-off.

【4】 Universal Adversarial Attack on Deep Learning Based Prognostics
标题：基于深度学习的预测学的通用对抗性攻击
链接：https://arxiv.org/abs/2109.07142

作者：Arghya Basak,Pradeep Rathore,Sri Harsha Nistala,Sagar Srinivas,Venkataramana Runkana
机构：TCS Research, Pune, India
备注：7 pages
摘要：基于深度学习的时间序列模型在工程和制造业中被广泛用于过程控制和优化、资产监控、诊断和预测性维护。这些模型在预测工业设备的剩余使用寿命（RUL）方面有了很大的改进，但受到敌方攻击的固有脆弱性的影响。这些攻击很容易被利用，并可能导致关键工业设备发生灾难性故障。通常，针对输入数据的每个实例计算不同的对抗性扰动。然而，由于更高的计算要求和缺乏对输入数据的不间断访问，攻击者很难实时实现这一点。因此，我们提出了普遍对抗性扰动的概念，这是一种特殊的不易察觉的噪声，用于愚弄基于回归的RUL预测模型。攻击者可以很容易地利用通用对抗性干扰进行实时攻击，因为连续访问输入数据和重复计算对抗性干扰不是进行实时攻击的先决条件。我们使用NASA涡扇发动机数据集评估了普遍对抗性攻击的效果。我们表明，对输入数据的任何实例添加普遍对抗性扰动都会增加模型预测的输出误差。据我们所知，我们是第一个研究普遍对抗性扰动对时间序列回归模型的影响的人。我们进一步证明了改变扰动强度对RUL预测模型的影响，并发现模型精度随着普遍对抗攻击扰动强度的增加而降低。我们还展示了普遍的对抗性扰动可以在不同的模型之间传递。
摘要：Deep learning-based time series models are being extensively utilized in engineering and manufacturing industries for process control and optimization, asset monitoring, diagnostic and predictive maintenance. These models have shown great improvement in the prediction of the remaining useful life (RUL) of industrial equipment but suffer from inherent vulnerability to adversarial attacks. These attacks can be easily exploited and can lead to catastrophic failure of critical industrial equipment. In general, different adversarial perturbations are computed for each instance of the input data. This is, however, difficult for the attacker to achieve in real time due to higher computational requirement and lack of uninterrupted access to the input data. Hence, we present the concept of universal adversarial perturbation, a special imperceptible noise to fool regression based RUL prediction models. Attackers can easily utilize universal adversarial perturbations for real-time attack since continuous access to input data and repetitive computation of adversarial perturbations are not a prerequisite for the same. We evaluate the effect of universal adversarial attacks using NASA turbofan engine dataset. We show that addition of universal adversarial perturbation to any instance of the input data increases error in the output predicted by the model. To the best of our knowledge, we are the first to study the effect of the universal adversarial perturbation on time series regression models. We further demonstrate the effect of varying the strength of perturbations on RUL prediction models and found that model accuracy decreases with the increase in perturbation strength of the universal adversarial attack. We also showcase that universal adversarial perturbation can be transferred across different models.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations
标题：SupCL-Seq：下游优化序列表示的有监督对比学习
链接：https://arxiv.org/abs/2109.07424

作者：Hooman Sedghamiz,Shivam Raval,Enrico Santus,Tuka Alhanai,Mohammad Ghassemi
机构： DSIG - Bayer Pharmaceuticals, New Jersey, USA, New York University, Abu Dhabi, UAE, Michigan State University, Michigan, USA
备注：short paper, EMNLP 2021, Findings
摘要：虽然对比学习被证明是计算机视觉中一种有效的训练策略，但自然语言处理（NLP）只是在最近才将其作为改进序列表示的蒙面语言建模（MLM）的一种自我监督替代方法。本文介绍了SupCL-Seq，它将有监督对比学习从计算机视觉扩展到NLP中序列表示的优化。通过改变标准转换器体系结构中的丢失掩码概率，我们为每个表示（锚定）生成增强的更改视图。然后利用监督对比损失最大化系统的能力，将相似样本（例如锚及其更改的视图）拉到一起，并将属于其他类别的样本推开。尽管简单，与标准BERTbase相比，SupCLSeq在GLUE基准上的许多序列分类任务中取得了巨大的进步，包括CoLA的绝对改善率为6%，MRPC的绝对改善率为5.4%，RTE的绝对改善率为4.7%，STSB的绝对改善率为2.6%。我们还显示了与自我监督对比学习表征相比的一致性，特别是在非语义任务中。最后，我们表明，这些增益不仅仅是由于扩增，而是由于下游优化序列表示。代码：https://github.com/hooman650/SupCL-Seq
摘要：While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also show consistent gains over self supervised contrastively learned representations, especially in non-semantic tasks. Finally we show that these gains are not solely due to augmentation, but rather to a downstream optimized sequence representation. Code: https://github.com/hooman650/SupCL-Seq

【2】 Improving Robustness and Efficiency in Active Learning with Contrastive Loss
标题：利用对比损失提高主动学习的鲁棒性和效率
链接：https://arxiv.org/abs/2109.06873

作者：Ranganath Krishnan,Nilesh Ahuja,Alok Sinha,Mahesh Subedar,Omesh Tickoo,Ravi Iyer
机构： Intel Labs, Hillsboro, USA, † Intel Corporation, Bangalore, India
备注：arXiv admin note: substantial text overlap with arXiv:2109.06321
摘要：本文介绍了有监督对比主动学习（SCAL），利用对比损失在有监督环境下进行主动学习。我们在主动学习中提出了有效的查询策略，以选择不同特征表示的无偏且信息丰富的数据样本。我们证明了我们提出的方法减少了采样偏差，在主动学习设置中实现了最先进的精度和模型校准，查询计算比CoreSet快11倍，比贝叶斯不一致主动学习快26倍。我们的方法即使在数据集不平衡的情况下也能产生校准良好的模型。我们还评估了主动学习设置中对数据集移位和分布外的鲁棒性，并证明我们提出的SCAL方法比高性能计算密集型方法有更大的优势（在数据集移位情况下，平均高8.9%的AUROC和平均低7.2%的ECE）。
摘要：This paper introduces supervised contrastive active learning (SCAL) by leveraging the contrastive loss for active learning in a supervised setting. We propose efficient query strategies in active learning to select unbiased and informative data samples of diverse feature representations. We demonstrate our proposed method reduces sampling bias, achieves state-of-the-art accuracy and model calibration in an active learning setup with the query computation 11x faster than CoreSet and 26x faster than Bayesian active learning by disagreement. Our method yields well-calibrated models even with imbalanced datasets. We also evaluate robustness to dataset shift and out-of-distribution in active learning setup and demonstrate our proposed SCAL method outperforms high performing compute-intensive methods by a bigger margin (average 8.9% higher AUROC for out-of-distribution detection and average 7.2% lower ECE under dataset shift).

【3】 The potential of self-supervised networks for random noise suppression in seismic data
标题：自监督网络抑制地震数据随机噪声的潜力
链接：https://arxiv.org/abs/2109.07344

作者：Claire Birnie,Matteo Ravasi,Tariq Alkhalifah,Sixiu Liu
机构：KAUST, Thuwal, Kingdom of Saudi Arabia
摘要：噪声抑制是任何地震处理工作流程中必不可少的一步。这种噪声的一部分，特别是在陆地数据集中，表现为随机噪声。近年来，神经网络已成功地用于有监督的地震数据去噪。然而，监督学习总是伴随着通常无法实现的要求，即训练时需要有噪声的干净数据对。使用盲点网络，我们将去噪任务重新定义为一个自监督过程，其中网络使用周围的噪声样本来估计中心样本的无噪声值。基于样本间噪声统计独立的假设，网络因其随机性而难以预测样本的噪声分量，同时因其时空相干性而准确预测信号分量。通过合成实例说明，盲点网络是受随机噪声污染的地震数据的有效去噪器，对信号的损伤最小；因此，在图像域和后续任务（如反演）方面都有改进。最后，将所提出的方法应用于现场数据，并将结果与两种常用的随机去噪技术：FX反褶积和Curvelet变换进行了比较。通过证明盲点网络是随机噪声的有效抑制器，我们相信这只是在地震应用中使用自监督学习的开始。
摘要：Noise suppression is an essential step in any seismic processing workflow. A portion of this noise, particularly in land datasets, presents itself as random noise. In recent years, neural networks have been successfully used to denoise seismic data in a supervised fashion. However, supervised learning always comes with the often unachievable requirement of having noisy-clean data pairs for training. Using blind-spot networks, we redefine the denoising task as a self-supervised procedure where the network uses the surrounding noisy samples to estimate the noise-free value of a central sample. Based on the assumption that noise is statistically independent between samples, the network struggles to predict the noise component of the sample due to its randomnicity, whilst the signal component is accurately predicted due to its spatio-temporal coherency. Illustrated on synthetic examples, the blind-spot network is shown to be an efficient denoiser of seismic data contaminated by random noise with minimal damage to the signal; therefore, providing improvements in both the image domain and down-the-line tasks, such as inversion. To conclude the study, the suggested approach is applied to field data and the results are compared with two commonly used random denoising techniques: FX-deconvolution and Curvelet transform. By demonstrating that blind-spot networks are an efficient suppressor of random noise, we believe this is just the beginning of utilising self-supervised learning in seismic applications.

【4】 Evolutionary Reinforcement Learning Dynamics with Irreducible Environmental Uncertainty
标题：具有不可约环境不确定性的进化强化学习动力学
链接：https://arxiv.org/abs/2109.07259

作者：Wolfram Barfuss,Richard P. Mann
机构： 20 10)Preprint working paper 1University of Tübingen, Germany; 2University of Leeds
备注：14 pages, 7 figures
摘要：在这项工作中，我们推导并提出了进化强化学习动力学，其中代理对环境的当前状态具有不可还原的不确定性。我们评估了不同类别的部分可观测agent环境系统的动力学，发现不可还原的环境不确定性可以更快地带来更好的学习结果，稳定学习过程并克服社会困境。然而，正如预期的那样，我们也发现部分可观测性可能会导致更糟糕的学习结果，例如，以灾难性极限环的形式。与完全观察的代理相比，具有不可还原的环境不确定性的学习通常需要更多的探索和更少的未来回报，以获得最佳的学习结果。此外，我们还发现了由部分可观测性引起的一系列动力学效应，例如，奖励机制之间的学习过程的临界减速，以及学习动力学分为快方向和慢方向。所提出的动力学是生物学、社会科学和机器学习领域的研究人员系统研究环境不确定性的进化效应的实用工具。
摘要：In this work we derive and present evolutionary reinforcement learning dynamics in which the agents are irreducibly uncertain about the current state of the environment. We evaluate the dynamics across different classes of partially observable agent-environment systems and find that irreducible environmental uncertainty can lead to better learning outcomes faster, stabilize the learning process and overcome social dilemmas. However, as expected, we do also find that partial observability may cause worse learning outcomes, for example, in the form of a catastrophic limit cycle. Compared to fully observant agents, learning with irreducible environmental uncertainty often requires more exploration and less weight on future rewards to obtain the best learning outcomes. Furthermore, we find a range of dynamical effects induced by partial observability, e.g., a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions. The presented dynamics are a practical tool for researchers in biology, social science and machine learning to systematically investigate the evolutionary effects of environmental uncertainty.

强化学习(4篇)

【1】 DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning
标题：DCUR：基于强化学习的样本教学数据课程
链接：https://arxiv.org/abs/2109.07380

作者：Daniel Seita,Abhinav Gopal,Zhao Mandi,John Canny
机构： [ 3 2] and whenthe student can engage in a small amount of self-generated 1University of California
备注：Supplementary material is available at this https URL
摘要：深度强化学习（RL）在经验上取得了巨大的成功，但存在脆性和样本效率低下的问题。一个潜在的补救办法是使用以前受过训练的政策作为监督的来源。在这项工作中，我们将这些政策称为教师，并研究如何通过关注数据使用将他们的专业知识转化为新的学生政策。我们提出了一个框架，强化学习数据课程（DCUR），该框架首先使用在线深度RL训练教师，并存储记录的环境交互历史。然后，学生通过离线运行RL或使用教师数据结合少量自行生成的数据进行学习。DCUR的中心思想涉及定义一类数据课程，作为训练时间的函数，该类课程限制学生从完整教师数据的固定子集中采样。我们在各种数据课程中使用最先进的深度RL算法对教师和学生进行测试。结果表明，数据课程的选择显著影响学生的学习，在早期训练阶段限制数据，同时逐渐让数据可用性随时间增长是有益的。我们确定学生何时可以离线学习并匹配教师的表现，而无需依赖专门的离线RL算法。此外，我们还表明，收集一小部分在线数据与数据课程相辅相成。补充资料可于https://tinyurl.com/teach-dcur.
摘要：Deep reinforcement learning (RL) has shown great empirical successes, but suffers from brittleness and sample inefficiency. A potential remedy is to use a previously-trained policy as a source of supervision. In this work, we refer to these policies as teachers and study how to transfer their expertise to new student policies by focusing on data usage. We propose a framework, Data CUrriculum for Reinforcement learning (DCUR), which first trains teachers using online deep RL, and stores the logged environment interaction history. Then, students learn by running either offline RL or by using teacher data in combination with a small amount of self-generated data. DCUR's central idea involves defining a class of data curricula which, as a function of training time, limits the student to sampling from a fixed subset of the full teacher data. We test teachers and students using state-of-the-art deep RL algorithms across a variety of data curricula. Results suggest that the choice of data curricula significantly impacts student learning, and that it is beneficial to limit the data during early training stages while gradually letting the data availability grow over time. We identify when the student can learn offline and match teacher performance without relying on specialized offline RL algorithms. Furthermore, we show that collecting a small fraction of online data provides complementary benefits with the data curriculum. Supplementary material is available at https://tinyurl.com/teach-dcur.

【2】 Back to Basics: Deep Reinforcement Learning in Traffic Signal Control
标题：回归基础：深度强化学习在交通信号控制中的应用
链接：https://arxiv.org/abs/2109.07180

作者：Sierk Kanis,Laurens Samson,Daan Bloembergen,Tim Bakker
机构：University of Amsterdam, Amsterdam, The Netherlands, CTO, City of Amsterdam
备注：9 pages, 4 figures; code for this paper is available at this https URL
摘要：在本文中，我们回顾了自我学习交通信号灯的强化学习（RL）方法的一些基本前提。我们提出了RLight，这是一种组合选择，提供了强大的性能和对看不见的流量的良好泛化。特别是，我们的主要贡献有三个方面：我们的轻量级和集群感知状态表示导致了性能的提高；我们重新制定了MDP，使其跳过黄色灯光的冗余时间步，将学习速度提高了30%；我们研究了作用空间，深入了解了非循环相变和循环相变之间的性能差异。此外，我们还深入了解了这些方法对不可见流量的推广。使用真实杭州交通数据集进行的评估表明，RLight优于最先进的基于规则的深度强化学习算法，证明了基于RL的方法改善城市交通流的潜力。
摘要：In this paper we revisit some of the fundamental premises for a reinforcement learning (RL) approach to self-learning traffic lights. We propose RLight, a combination of choices that offers robust performance and good generalization to unseen traffic flows. In particular, our main contributions are threefold: our lightweight and cluster-aware state representation leads to improved performance; we reformulate the MDP such that it skips redundant timesteps of yellow light, speeding up learning by 30%; and we investigate the action space and provide insight into the difference in performance between acyclic and cyclic phase transitions. Additionally, we provide insights into the generalisation of the methods to unseen traffic. Evaluations using the real-world Hangzhou traffic dataset show that RLight outperforms state-of-the-art rule-based and deep reinforcement learning algorithms, demonstrating the potential of RL-based methods to improve urban traffic flows.

【3】 Optimal Cycling of a Heterogenous Battery Bank via Reinforcement Learning
标题：基于强化学习的异质电池组优化循环
链接：https://arxiv.org/abs/2109.07137

作者：Vivek Deulkar,Jayakrishnan Nair
机构：Dept. of Electrical Engineering, IIT Bombay
备注：Appeared on IEEE SmartGridComm 2021 conference
摘要：我们考虑的问题，最佳充电/放电的一组异质电池组，由随机发电和需求过程驱动。电池组中的电池在容量、斜坡限制、损耗以及循环成本方面可能有所不同。目标是最大限度地降低长期电池循环相关的退化成本；这是一个正式提出的马尔可夫决策过程。我们提出了一种基于线性函数近似的Q-学习算法来学习最优解，使用一类特殊设计的核函数来近似与MDP相关的值函数的结构。通过大量的案例研究，验证了该算法的有效性。
摘要：We consider the problem of optimal charging/discharging of a bank of heterogenous battery units, driven by stochastic electricity generation and demand processes. The batteries in the battery bank may differ with respect to their capacities, ramp constraints, losses, as well as cycling costs. The goal is to minimize the degradation costs associated with battery cycling in the long run; this is posed formally as a Markov decision process. We propose a linear function approximation based Q-learning algorithm for learning the optimal solution, using a specially designed class of kernel functions that approximate the structure of the value functions associated with the MDP. The proposed algorithm is validated via an extensive case study.

【4】 WaveCorr: Correlation-savvy Deep Reinforcement Learning for Portfolio Management
标题：WaveCorr：用于投资组合管理的相关性感知深度强化学习
链接：https://arxiv.org/abs/2109.07005

作者：Saeed Marzban,Erick Delage,Jonathan Yumeng Li,Jeremie Desgagne-Bouchard,Carl Dussault
机构：GERAD & Department of Decision Sciences, HEC Montréal, Montreal, Canada, Telfer School of Management, University of Ottawa, Ottawa, Canada, Evovest, Montreal, Canada
摘要：投资组合管理问题是一类重要且具有挑战性的动态决策问题，需要在考虑投资者偏好、交易环境和市场条件等诸多因素的情况下，随着时间的推移做出再平衡决策。在本文中，我们提出了一种新的用于深度强化学习（DRL）的投资组合策略网络体系结构，它可以更有效地利用跨资产依赖信息，并获得比最新体系结构更好的性能。特别是，我们为利用多资产时间序列数据的投资组合策略网络引入了一个新属性，称为\textit{asset permutation invariance}，并设计了第一个投资组合策略网络WaveCorr，该网络在处理资产相关信息时保留此不变性属性。我们设计的核心是一个创新的置换不变相关处理层。使用来自加拿大（TSX）和美国股市（S&P 500）的数据进行了一系列广泛的实验，WaveCorr的平均年回报率绝对提高了3%-25%，平均夏普比率相对提高了200%以上，始终优于其他架构。我们还测量了在初始资产排序和权重的随机选择下，性能稳定性的改善因子高达5。我们的工业合作伙伴发现网络的稳定性特别重要。
摘要：The problem of portfolio management represents an important and challenging class of dynamic decision making problems, where rebalancing decisions need to be made over time with the consideration of many factors such as investors preferences, trading environments, and market conditions. In this paper, we present a new portfolio policy network architecture for deep reinforcement learning (DRL)that can exploit more effectively cross-asset dependency information and achieve better performance than state-of-the-art architectures. In particular, we introduce a new property, referred to as \textit{asset permutation invariance}, for portfolio policy networks that exploit multi-asset time series data, and design the first portfolio policy network, named WaveCorr, that preserves this invariance property when treating asset correlation information. At the core of our design is an innovative permutation invariant correlation processing layer. An extensive set of experiments are conducted using data from both Canadian (TSX) and American stock markets (S&P 500), and WaveCorr consistently outperforms other architectures with an impressive 3%-25% absolute improvement in terms of average annual return, and up to more than 200% relative improvement in average Sharpe ratio. We also measured an improvement of a factor of up to 5 in the stability of performance under random choices of initial asset ordering and weights. The stability of the network has been found as particularly valuable by our industrial partner.

医学相关(3篇)

【1】 Modelling Major Disease Outbreaks in the 21st Century: A Causal Approach
标题：模拟21世纪重大疾病暴发：一种因果方法
链接：https://arxiv.org/abs/2109.07266

作者：Abli Marathe,Saloni Parekh,Harsh Sakhrani
机构：Dept. of Information Technology, Pune Institute of Computer, Pune, India
备注：Accepted at Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD-epiDAMIK) 2021: The 4th International Workshop on Epidemiology meets Data Mining and Knowledge discovery
摘要：旨在模拟全球事件动态的流行病学家在确定与疾病爆发等异常现象相关的因素方面面临着重大挑战。在本文中，我们提出了一种新的方法，通过使用全球发展指标作为标记来确定对疾病暴发敏感的最重要的发展部门。我们使用统计方法来评估这些指标与疾病爆发之间的因果关系，并找到最常见的排名指标。除了统计分析之外，我们还使用数据插补技术将原始真实世界数据集转换为有意义的数据，用于因果推断。应用各种算法检测指标之间的因果关系是本研究的主题。尽管各国政府政策的差异导致了因果关系的差异，但一些指标已成为21世纪世界各地疾病暴发的重要决定因素。
摘要：Epidemiologists aiming to model the dynamics of global events face a significant challenge in identifying the factors linked with anomalies such as disease outbreaks. In this paper, we present a novel method for identifying the most important development sectors sensitive to disease outbreaks by using global development indicators as markers. We use statistical methods to assess the causative linkages between these indicators and disease outbreaks, as well as to find the most often ranked indicators. We used data imputation techniques in addition to statistical analysis to convert raw real-world data sets into meaningful data for causal inference. The application of various algorithms for the detection of causal linkages between the indicators is the subject of this research. Despite the fact that disparities in governmental policies between countries account for differences in causal linkages, several indicators emerge as important determinants sensitive to disease outbreaks over the world in the 21st Century.

【2】 WIP: Medical Incident Prediction Through Analysis of Electronic Medical Records Using Machine Lerning: Fall Prediction
标题：WIP：通过机器学习分析电子病历的医疗事件预测：秋季预测
链接：https://arxiv.org/abs/2109.07106

作者：Atsushi Yanagisawa,Chintaka Premachandra,Hiruharu Kawanaka,Atsushi Inoue,Takeo Hata,Eiichiro Ueda
机构：,Takatsuki High School, Japan, ,Shibaura Institute of Technology, Japan, ,Mie University, Japan, ,Osaka Medical and Pharmaceutical University, Japan
备注：None
摘要：本文报告了我们利用机器学习在一般医疗事故预测和具体跌倒风险预测方面的初步工作。用于机器学习的数据仅从大阪医药大学医院的电子病历（EMR）的特定子集生成。通过进行三个实验，如（1）机器学习算法比较，（2）处理不平衡，（3）调查解释变量对坠落事件预测的贡献，我们发现解释变量的调查最有效。
摘要：This paper reports our preliminary work on medical incident prediction in general, and fall risk prediction in specific, using machine learning. Data for the machine learning are generated only from the particular subset of the electronic medical records (EMR) at Osaka Medical and Pharmaceutical University Hospital. As a result of conducting three experiments such as (1) machine learning algorithm comparison, (2) handling imbalance, and (3) investigation of explanatory variable contribution to the fall incident prediction, we find the investigation of explanatory variables the most effective.

【3】 Deploying clinical machine learning? Consider the following...
标题：部署临床机器学习？请考虑以下几点。
链接：https://arxiv.org/abs/2109.06919

作者：Charles Lu,Ken Chang,Praveer Singh,Stuart Pomerantz,Sean Doyle,Sujay Kakarmath,Christopher Bridge,Jayashree Kalpathy-Cramer
机构：Massachusetts General Hospital, Boston, MA, Massachusetts Institute of Technology, Cambridge, MA, Harvard Medical School, Boston, MA, Mass General Brigham, Boston, MA
摘要：尽管对临床机器学习（CML）研究的高度关注和投入，但转化为临床实践的应用相对较少。虽然研究对于推动最先进的技术非常重要，但翻译对于将这些技术引入最终影响患者护理并达到围绕医疗领域人工智能的广泛期望同样重要。为了更好地描述研究人员和从业者之间的整体观点，我们调查了几名在开发CML临床应用方面有经验的参与者，了解他们所学到的经验。为了更好地设计和开发临床机器学习应用程序，我们整理了这些见解并确定了几个主要类别的障碍和陷阱。
摘要：Despite the intense attention and investment into clinical machine learning (CML) research, relatively few applications convert to clinical practice. While research is important in advancing the state-of-the-art, translation is equally important in bringing these technologies into a position to ultimately impact patient care and live up to extensive expectations surrounding AI in healthcare. To better characterize a holistic perspective among researchers and practitioners, we survey several participants with experience in developing CML for clinical deployment about their learned experiences. We collate these insights and identify several main categories of barriers and pitfalls in order to better design and develop clinical machine learning applications.

蒸馏|知识提取(2篇)

【1】 Constraint based Knowledge Base Distillation in End-to-End Task Oriented Dialogs
标题：端到端任务导向对话中基于约束的知识库提炼
链接：https://arxiv.org/abs/2109.07396

作者：Dinesh Raghu,Atishya Jain,Mausam,Sachindra Joshi
机构： IIT Delhi, New Delhi, India, IBM Research, New Delhi, India
备注：D. Raghu and A. Jain contributed equally to this work
摘要：端到端面向任务的对话系统根据对话历史和附带的知识库（KB）生成响应。推断与话语最相关的知识库实体对于响应生成至关重要。现有的最先进技术通过对不相关的KB信息进行软过滤，可扩展到大KB。在本文中，我们提出了一种新的过滤技术，它包括：（1）一个基于成对相似性的过滤器，该过滤器通过考虑KB记录中的n元结构来识别相关信息。（2）辅助损失，有助于分离上下文无关的知识库信息。我们还提出了一个新的度量——多集实体F1，它修复了现有实体F1度量中的一个正确性问题。在三个公开的面向任务的对话数据集上的实验结果表明，我们提出的方法优于现有的最新模型。
摘要：End-to-End task-oriented dialogue systems generate responses based on dialog history and an accompanying knowledge base (KB). Inferring those KB entities that are most relevant for an utterance is crucial for response generation. Existing state of the art scales to large KBs by softly filtering over irrelevant KB information. In this paper, we propose a novel filtering technique that consists of (1) a pairwise similarity based filter that identifies relevant information by respecting the n-ary structure in a KB record. and, (2) an auxiliary loss that helps in separating contextually unrelated KB information. We also propose a new metric -- multiset entity F1 which fixes a correctness issue in the existing entity F1 metric. Experimental results on three publicly available task-oriented dialog datasets show that our proposed approach outperforms existing state-of-the-art models.

【2】 {E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
标题：{E}高效{BERT}：通过预热知识蒸馏逐步搜索多层感知器
链接：https://arxiv.org/abs/2109.07222

作者：Chenhe Dong,Guangrun Wang,Hang Xu,Jiefeng Peng,Xiaozhe Ren,Xiaodan Liang
机构： Shenzhen Campus of Sun Yat-sen University , University of Oxford, Huawei Noah’s Ark Lab , DarkMatter AI Research
备注：Findings of EMNLP 2021
摘要：预先训练的语言模型在各种自然语言处理任务中表现出显著的效果。然而，由于它们体积庞大，推理速度慢，很难在边缘设备上部署它们。在本文中，我们有一个关键的见解，即改进BERT中的前馈网络（FFN）比改进多头部注意（MHA）具有更高的增益，因为FFN的计算成本是MHA的2$\sim$3倍。因此，为了紧凑的BERT，我们致力于设计高效的FFN，而不是以前关注MHA的工作。由于FFN包含一个多层感知器（MLP），这在BERT优化中是必不可少的，因此我们进一步设计了一个针对高级MLP的彻底搜索空间，并执行从粗到精的机制来搜索有效的BERT体系结构。此外，为了加快搜索速度和增强模型的可移植性，我们在每个搜索阶段采用了一种新的预热知识提取策略。大量的实验表明，我们的搜索效率BERT比BERT$\rm_{BASE}$小6.9$\倍，快4.4$\倍，并且在GLUE和SQuAD基准测试中具有竞争力。具体来说，EfficientBERT在GLUE\emph{test}上的平均得分为77.7分，比MobileBERT$\rm{TINY}高0.7分，在球队v1.1/v2.0\emph{dev}上的F1得分为85.3/74.5分，即使没有数据增加，也比TinyBERT$\u4$高3.2/2.7分。该代码发布于https://github.com/cheneydon/efficient-bert.
摘要：Pre-trained language models have shown remarkable results on various NLP tasks. Nevertheless, due to their bulky size and slow inference speed, it is hard to deploy them on edge devices. In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA. Hence, to compact BERT, we are devoted to designing efficient FFN as opposed to previous works that pay attention to MHA. Since FFN comprises a multilayer perceptron (MLP) that is essential in BERT optimization, we further design a thorough search space towards an advanced MLP and perform a coarse-to-fine mechanism to search for an efficient BERT architecture. Moreover, to accelerate searching and enhance model transferability, we employ a novel warm-up knowledge distillation strategy at each search stage. Extensive experiments show our searched EfficientBERT is 6.9$\times$ smaller and 4.4$\times$ faster than BERT$\rm_{BASE}$, and has competitive performances on GLUE and SQuAD Benchmarks. Concretely, EfficientBERT attains a 77.7 average score on GLUE \emph{test}, 0.7 higher than MobileBERT$\rm_{TINY}$, and achieves an 85.3/74.5 F1 score on SQuAD v1.1/v2.0 \emph{dev}, 3.2/2.7 higher than TinyBERT$_4$ even without data augmentation. The code is released at https://github.com/cheneydon/efficient-bert.

聚类(1篇)

【1】 Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior
标题：Powered Hawkes-Dirichlet过程：使用灵活的时间先验挑战文本聚类
链接：https://arxiv.org/abs/2109.07170

作者：Gaël Poux-Médard,Julien Velcin,Sabine Loudcher
机构：ERIC Lab, Universit´e de Lyon, Lyon, France, -,-,-,X
摘要：文档的文本内容及其发布日期相互交织。例如，根据潜在的时间动态，关于某个主题的新闻文章的发表会受到以前关于类似问题的发表的影响。然而，当文本信息传递的信息很少或时间动态难以揭示时，检索有意义的信息可能是一个挑战。此外，文档的文本内容并不总是与其时间动态相关联。我们开发了一种灵活的方法，根据文本文档的内容和发布时间创建文本文档集群，即Powered Dirichlet-Hawkes过程（PDHP）。我们发现，当时间信息或文本内容的信息量较弱时，PDHP比最先进的模型产生的结果要好得多。PDHP也缓解了文本内容和时间动态总是完全相关的假设。PDHP允许检索文本集群、时态集群或两者的混合，但在不需要的情况下具有较高的准确性。我们证明了PDHP推广了以前的工作，如Dirichlet-Hawkes过程（DHP）和均匀过程（UP）。最后，我们使用Reddit数据在实际应用程序中演示了PDHP对DHP和UP的影响。
摘要：The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little information or when temporal dynamics are hard to unveil. Furthermore, the textual content of a document is not always linked to its temporal dynamics. We develop a flexible method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). We show PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. The PDHP also alleviates the hypothesis that textual content and temporal dynamics are always perfectly correlated. PDHP allows retrieving textual clusters, temporal clusters, or a mixture of both with high accuracy when they are not. We demonstrate that PDHP generalizes previous work --such as the Dirichlet-Hawkes process (DHP) and Uniform process (UP). Finally, we illustrate the changes induced by PDHP over DHP and UP in a real-world application using Reddit data.

自动驾驶|车辆|车道检测等(1篇)

【1】 Risk Measurement, Risk Entropy, and Autonomous Driving Risk Modeling
标题：风险度量、风险熵与自主驾驶风险建模
链接：https://arxiv.org/abs/2109.07211

作者：Jiamin Yu
机构：Received: date Accepted: date
备注：11 pages, 5 figures, IME 2021
摘要：长期以来，人们一直在使用自动驾驶车辆的大数据来感知、预测、规划和控制驾驶。自然，人们越来越质疑为什么不将这些大数据用于风险管理和精算建模。本文探讨了自动驾驶情景下风险建模的新技术难点、新思路和方法。与传统风险模型相比，新模型更符合实际道路交通和驾驶安全性能。更重要的是，它为在计算机模拟环境下实现风险评估和汽车保险定价提供了技术可行性。
摘要：It has been for a long time to use big data of autonomous vehicles for perception, prediction, planning, and control of driving. Naturally, it is increasingly questioned why not using this big data for risk management and actuarial modeling. This article examines the emerging technical difficulties, new ideas, and methods of risk modeling under autonomous driving scenarios. Compared with the traditional risk model, the novel model is more consistent with the real road traffic and driving safety performance. More importantly, it provides technical feasibility for realizing risk assessment and car insurance pricing under a computer simulation environment.

联邦学习|隐私保护|加密(1篇)

【1】 Federated Learning of Molecular Properties in a Heterogeneous Setting
标题：异质环境下分子性质的联合学习
链接：https://arxiv.org/abs/2109.07258

作者：Wei Zhu,Andrew White,Jiebo Luo
机构：Department of Computer Science, University of Rochester, Department of Chemical Engineering, University of Rochester
摘要：化学研究进行实验的材料和计算成本都很高。因此，机构认为化学数据是有价值的，很少有人努力构建大型的公共数据集用于机器学习。另一个挑战是，不同的直觉对不同类别的分子感兴趣，从而创建传统分布式训练无法轻松加入的异构数据。在这项工作中，我们引入联邦异构分子学习来应对这些挑战。联合学习允许最终用户协作构建全局模型，同时保留分布在独立客户机上的训练数据。由于缺乏相关研究，我们首先模拟了一个称为FedChem的联邦异构基准测试。FedChem是通过在现有数据集上联合执行支架拆分和潜在Dirichlet分配来构建的。我们在FedChem上的研究结果表明，当使用异构分子时，会出现重大的学习挑战。然后，我们提出了一种缓解该问题的方法，即通过实例重新加权的联邦学习（FLIT）。FLIT可以通过提高不确定样本的性能，跨异构客户机调整本地训练。在我们的新基准FedChem上进行的综合实验验证了该方法相对于其他联邦学习方案的优势。FedChem应该能够实现一种新型的合作，以改进化学中的AI，从而缓解对有价值化学数据的担忧。
摘要：Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data to be valuable and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. In this work, we introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients. Due to the lack of related research, we first simulate a federated heterogeneous benchmark called FedChem. FedChem is constructed by jointly performing scaffold splitting and Latent Dirichlet Allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules. We then propose a method to alleviate the problem, namely Federated Learning by Instance reweighTing (FLIT). FLIT can align the local training across heterogeneous clients by improving the performance for uncertain samples. Comprehensive experiments conducted on our new benchmark FedChem validate the advantages of this method over other federated learning schemes. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.

推理|分析|理解|解释(3篇)

【1】 FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining
标题：FORTAP：使用公式进行数值推理的表预训练
链接：https://arxiv.org/abs/2109.07323

作者：Zhoujun Cheng,Haoyu Dong,Fan Cheng,Ran Jia,Pengfei Wu,Shi Han,Dongmei Zhang
备注：Work in progress
摘要：表格存储了丰富的数值数据，但对表格进行数值推理仍然是一个挑战。在本文中，我们发现电子表格公式，对表格中的数值进行计算，自然是数值推理的有力监督。更重要的是，网上有大量的电子表格和专家制作的公式，可以很容易地获得。FORTAP是第一种利用大量电子表格公式进行数字推理感知表预训练的方法。我们设计了两个公式预训练任务，明确指导FORTAP学习半结构化表格中的数值参考和计算。FORTAP在两个具有代表性的下游任务（单元类型分类和公式预测）上取得了最先进的结果，显示了数值推理感知预训练的巨大潜力。
摘要：Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning. More importantly, large amounts of spreadsheets with expert-made formulae are available on the web and can be obtained easily. FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of spreadsheet formulae. We design two formula pretraining tasks to explicitly guide FORTAP to learn numerical reference and calculation in semi-structured tables. FORTAP achieves state-of-the-art results on two representative downstream tasks, cell type classification and formula prediction, showing great potential of numerical-reasoning-aware pretraining.

【2】 Internet of Behavior (IoB) and Explainable AI Systems for Influencing IoT Behavior
标题：行为互联网(IoB)和影响物联网行为的可解释人工智能系统
链接：https://arxiv.org/abs/2109.07239

作者：Haya Elayan,Moayad Aloqaily,Mohsen Guizani
机构：!
备注：Submitted to IEEE Network
摘要：多年来，流行病和自然灾害改变了人们的行为，对生活的方方面面都产生了巨大的影响。随着各个时代技术的发展，政府、组织和公司都利用这些技术来跟踪、控制和影响个人的行为。如今，物联网（IoT）、云计算和人工智能（AI）的使用使得通过改变IoT行为更容易跟踪和改变用户的行为。本文介绍并讨论了行为互联网（IoB）的概念及其与可解释人工智能（XAI）技术的集成，以在改变物联网行为以最终改善用户行为的过程中提供可信和明显的体验。因此，一个基于IoB和XAI的系统已经在一个电力消耗的用例场景中被提出，其目的是影响用户的消费行为以降低电力消耗和成本。情景结果显示，在200小时内，与原始消耗相比，有功功率减少了522.2 kW。它还显示，在同一时期，总电力成本节约了95.04欧元。此外，降低全局有功功率将通过正相关降低功率强度。
摘要：Pandemics and natural disasters over the years have changed the behavior of people, which has had a tremendous impact on all life aspects. With the technologies available in each era, governments, organizations, and companies have used these technologies to track, control, and influence the behavior of individuals for a benefit. Nowadays, the use of the Internet of Things (IoT), cloud computing, and artificial intelligence (AI) have made it easier to track and change the behavior of users through changing IoT behavior. This article introduces and discusses the concept of the Internet of Behavior (IoB) and its integration with Explainable AI (XAI) techniques to provide trusted and evident experience in the process of changing IoT behavior to ultimately improving users' behavior. Therefore, a system based on IoB and XAI has been proposed in a use case scenario of electrical power consumption that aims to influence user consuming behavior to reduce power consumption and cost. The scenario results showed a decrease of 522.2 kW of active power when compared to original consumption over a 200-hours period. It also showed a total power cost saving of 95.04 Euro for the same period. Moreover, decreasing the global active power will reduce the power intensity through the positive correlation.

【3】 Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data
标题：流数据随机逼近算法的非渐近分析
链接：https://arxiv.org/abs/2109.07117

作者：Antoine Godichon-Baggioni,Nicklas Werge,Olivier Wintenberger
机构：LPSM, Sorbonne Université, place Jussieu, Paris, France
摘要：在不断生成的高频数据流的推动下，实时学习变得越来越重要。这些数据流应该按照流可能随时间变化的属性顺序处理。在这个流设置中，我们提出了通过对凸目标的梯度进行无偏估计来最小化凸目标的技术，通常称为随机逼近问题。我们的方法依赖于随机近似算法，这是由于它们的计算优势，因为它们只使用先前的迭代作为参数估计。推理包括迭代平均，以保证经典条件下的最佳统计效率。我们的非渐近分析表明，通过根据预期数据流选择学习速率，可以加快收敛速度。我们证明了平均估计对任何数据流速率都是最优和鲁棒收敛的。此外，可以通过以特定模式处理数据来实现降噪，这有利于大规模机器学习。这些理论结果对各种数据流进行了说明，表明了所提算法的有效性。
摘要：Motivated by the high-frequency data streams continuously generated, real-time learning is becoming increasingly important. These data streams should be processed sequentially with the property that the stream may change over time. In this streaming setting, we propose techniques for minimizing a convex objective through unbiased estimates of its gradients, commonly referred to as stochastic approximation problems. Our methods rely on stochastic approximation algorithms due to their computationally advantage as they only use the previous iterate as a parameter estimate. The reasoning includes iterate averaging that guarantees optimal statistical efficiency under classical conditions. Our non-asymptotic analysis shows accelerated convergence by selecting the learning rate according to the expected data streams. We show that the average estimate converges optimally and robustly to any data stream rate. In addition, noise reduction can be achieved by processing the data in a specific pattern, which is advantageous for large-scale machine learning. These theoretical results are illustrated for various data streams, showing the effectiveness of the proposed algorithms.

检测相关(2篇)

【1】 Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked Claims in a Document
标题：协助人类事实核查人员：检测文档中所有先前经过事实核查的声明
链接：https://arxiv.org/abs/2109.07410

作者：Shaden Shaar,Firoj Alam,Giovanni Da San Martino,Preslav Nakov
机构： Qatar Computing Research Institute, HBKU, Doha, Qatar, University of Padova, Italy
备注：detecting previously fact-checked claims, fact-checking, disinformation, fake news, social media, political debates
摘要：鉴于最近网上虚假声明的激增，已经有了大量的人工事实核查工作。由于这非常耗时，人类事实核查人员可以从支持他们并使他们更高效的工具中获益。在这里，我们着重于构建一个能够提供这种支持的系统。给定一个输入文档，它的目标是检测包含一个声明的所有句子，该声明可以由一些先前事实检查过的声明（来自给定数据库）进行验证。输出是一个重新排列的文档语句列表，以便将那些可以验证的语句与相应的证据一起排列得尽可能高。与以前的研究索赔检索的工作不同，这里我们从文档级的角度进行研究。我们为任务创建了一个新的手动注释数据集，并提出了合适的评估措施。我们进一步试验了一种学习排名的方法，在几个强基线上实现了相当大的性能增益。我们的分析证明了建模文本相似性和立场的重要性，同时也考虑了检索到的先前事实检查的声明的准确性。我们相信这项研究会引起事实核查人员、记者、媒体和监管机构的兴趣。
摘要：Given the recent proliferation of false claims online, there has been a lot of manual fact-checking effort. As this is very time-consuming, human fact-checkers can benefit from tools that can support them and make them more efficient. Here, we focus on building a system that could provide such support. Given an input document, it aims to detect all sentences that contain a claim that can be verified by some previously fact-checked claims (from a given database). The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible, together with corresponding evidence. Unlike previous work, which has looked into claim retrieval, here we take a document-level perspective. We create a new manually annotated dataset for the task, and we propose suitable evaluation measures. We further experiment with a learning-to-rank approach, achieving sizable performance gains over several strong baselines. Our analysis demonstrates the importance of modeling text similarity and stance, while also taking into account the veracity of the retrieved previously fact-checked claims. We believe that this research would be of interest to fact-checkers, journalists, media, and regulatory authorities.

【2】 Photon detection probability prediction using one-dimensional generative neural network
标题：基于一维产生式神经网络的光子探测概率预测
链接：https://arxiv.org/abs/2109.07277

作者：Wei Mu,Alexander I. Himmel,Bryan Ramson
机构：Neutrino Division, Fermi National Accelerator Laboratory, Wilson Street and Kirk Road, Batavia, Illinois, U.S.A.
摘要：光子探测对于直接暗物质搜索或中微子性质测量的液氩探测器非常重要。光子输运的精确模拟被广泛用于了解液态氩探测器中光子探测的概率。使用GEANT4Simulation toolkit跟踪每个光子的传统光子传输模拟是千吨级液氩探测器和GeV级能量沉积的主要计算挑战。在这项工作中，我们提出了一个一维生成模型，该模型使用外部产品层高效地生成特征。该模型绕过光子传输模拟，以与GEANT4模拟相同的详细程度预测特定光子探测器检测到的光子数。在模拟千吨级液氩探测器中的光子探测系统中的应用表明，这种新的生成模型能够以良好的精度和20到50倍的速度再现光子4模拟。该生成模型可用于快速预测ProtoDUNE或DUNE等大型液氩探测器的光子探测概率。
摘要：Photon detection is important for liquid argon detectors for direct dark matter searches or neutrino property measurements. Precise simulation of photon transport is widely used to understand the probability of photon detection in liquid argon detectors. Traditional photon transport simulation, which tracks every photon using theGeant4simulation toolkit, is a major computational challenge for kilo-tonne-scale liquid argon detectors and GeV-level energy depositions. In this work, we propose a one-dimensional generative model which efficiently generates features using an OuterProduct-layer. This model bypasses photon transport simulation and predicts the number of photons detected by particular photon detectors at the same level of detail as theGeant4simulation. The application to simulating photon detection systems in kilo-tonne-scale liquid argon detectors demonstrates this novel generative model is able to reproduceGeant4simulation with good accuracy and 20 to 50 times faster. This generative model can be used to quickly predict photon detection probability in huge liquid argon detectors like ProtoDUNE or DUNE.

分类|识别(1篇)

【1】 Embedding Convolutions for Short Text Extreme Classification with Millions of Labels
标题：嵌入卷积的百万标签短文本极端分类算法
链接：https://arxiv.org/abs/2109.07319

作者：Siddhant Kharbanda,Atmadeep Banerjee,Akash Palrecha,Rohit Babbar
机构：Department of Computer Science, Aalto University
摘要：将短文本数据自动标注到大量目标标签上，称为短文本极端分类，最近在预测相关搜索和产品推荐任务方面得到了大量应用。卷积神经网络（CNN）在文本分类中捕获n-gram的传统用法在很大程度上依赖于单词排序的一致性和长输入序列的存在。但是，在搜索和推荐过程中遇到的短文本和非结构化文本序列中缺少这一点。为了解决这个问题，我们提出了一种正交方法，通过重铸卷积操作来捕获沿嵌入维度的耦合语义，并开发了一个词序不可知的嵌入增强模块来处理此类查询中的结构缺失。得益于卷积运算的计算效率，当将嵌入卷积应用于丰富的单词嵌入时，可产生重量轻但功能强大的编码器（InceptionXML），该编码器对短文本极端分类中固有的结构缺陷具有鲁棒性。为了将我们的模型扩展到数百万个标签的问题，我们还提出了InceptionXML+，它通过改进标签短列表器和极端分类器之间的对齐来解决最近提出的LightXML中动态硬否定挖掘框架的缺点。在流行的基准数据集上，我们以经验证明，所提出的方法比最先进的深度极端分类器（如Astec）的性能平均高出5%和8%P@k倾向性得分PSP@k分别是指标。
摘要：Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has recently found numerous applications in prediction of related searches and product recommendation tasks. The conventional usage of Convolutional Neural Network (CNN) to capture n-grams in text-classification relies heavily on uniformity in word-ordering and the presence of long input sequences to convolve over. However, this is missing in short and unstructured text sequences encountered in search and recommendation. In order to tackle this, we propose an orthogonal approach by recasting the convolution operation to capture coupled semantics along the embedding dimensions, and develop a word-order agnostic embedding enhancement module to deal with the lack of structure in such queries. Benefitting from the computational efficiency of the convolution operation, Embedding Convolutions, when applied on the enriched word embeddings, result in a light-weight and yet powerful encoder (InceptionXML) that is robust to the inherent lack of structure in short-text extreme classification. Towards scaling our model to problems with millions of labels, we also propose InceptionXML+, which addresses the shortcomings of the dynamic hard-negative mining framework in the recently proposed LightXML by improving the alignment between the label-shortlister and extreme classifier. On popular benchmark datasets, we empirically demonstrate that the proposed method outperforms state-of-the-art deep extreme classifiers such as Astec by an average of 5% and 8% on the P@k and propensity-scored PSP@k metrics respectively.

表征(2篇)

【1】 Comparing Text Representations: A Theory-Driven Approach
标题：文本表征比较：一种理论驱动的方法
链接：https://arxiv.org/abs/2109.07458

作者：Gregory Yauney,David Mimno
机构：Cornell University
备注：None
摘要：当代自然语言处理的大部分进展来自于学习表征，例如蒙面语言模型（MLM）上下文嵌入，它将具有挑战性的问题转化为简单的分类任务。但我们如何量化和解释这种影响呢？我们采用了计算学习理论中的通用工具来适应文本数据集的特定特征，并提出了一种评估表示和任务之间兼容性的方法。尽管许多任务可以用简单的单词袋（BOW）表示法轻松解决，但BOW在自然语言推理任务中表现不佳。对于一个这样的任务，我们发现BOW无法区分真实和随机标签，而预先训练的传销表示法显示真实和随机标签之间的区别是BOW的72倍。该方法为基于分类的NLP任务的难度提供了一个经过校准的定量度量，可以在表示之间进行比较，而不需要对初始化和超参数敏感的经验评估。该方法为数据集中的模式以及这些模式与特定标签的对齐提供了新的视角。
摘要：Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain this effect? We adapt general tools from computational learning theory to fit the specific characteristics of text datasets and present a method to evaluate the compatibility between representations and tasks. Even though many tasks can be easily solved with simple bag-of-words (BOW) representations, BOW does poorly on hard natural language inference tasks. For one such task we find that BOW cannot distinguish between real and randomized labelings, while pre-trained MLM representations show 72x greater distinction between real and random labelings than BOW. This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task, enabling comparisons between representations without requiring empirical evaluations that may be sensitive to initializations and hyperparameters. The method provides a fresh perspective on the patterns in a dataset and the alignment of those patterns with specific labels.

【2】 Deep Bregman Divergence for Contrastive Learning of Visual Representations
标题：视觉表征对比学习的深度Bregman发散
链接：https://arxiv.org/abs/2109.07455

作者：Mina Rezaei,Farzin Soleymani,Bernd Bischl,Shekoofeh Azizi
机构： Department of Statistics, LMU Munich, Germany, Department of Electrical and Computer Engineering, Technical University of Munich, Germany, Google Research, United States
摘要：Deep Bregman散度使用神经网络测量数据点的散度，该网络超出欧几里德距离，能够捕获分布上的散度。在这篇论文中，我们提出了视觉表征对比学习的深层Bregman分歧，目的是通过训练基于功能Bregman分歧的附加网络来增强自监督学习中的对比损失。与传统的仅基于单点差异的对比学习方法相比，我们的框架能够捕获分布之间的差异，从而提高学习表征的质量。通过将传统的对比损失与提出的发散损失相结合，我们的方法在多分类、目标检测任务和数据集上优于基线和大多数以前的自监督和半监督学习方法。该方法的源代码和所有实验的源代码可在附录中获得。
摘要：Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance and capable of capturing divergence over distributions. In this paper, we propose deep Bregman divergences for contrastive learning of visual representation and we aim to enhance contrastive loss used in self-supervised learning by training additional networks based on functional Bregman divergence. In contrast to the conventional contrastive learning methods which are solely based on divergences between single points, our framework can capture the divergence between distributions which improves the quality of learned representation. By combining conventional contrastive loss with the proposed divergence loss, our method outperforms baseline and most of previous methods for self-supervised and semi-supervised learning on multiple classifications and object detection tasks and datasets. The source code of the method and of all the experiments are available at supplementary.

优化|敛散性(4篇)

【1】 DROMO: Distributionally Robust Offline Model-based Policy Optimization
标题：Dromo：基于离线模型的分布式健壮策略优化
链接：https://arxiv.org/abs/2109.07275

作者：Ruizhen Liu,Dazhi Zhong,Zhicong Chen
机构：Affiliated High School of South China Normal University
备注：Under review of S.-T. Yau Award 2021 of Computer Science
摘要：考虑基于模型的控制的离线强化学习问题，其目的是从经验重放中学习动态模型，并在学习模型下获得悲观取向的Agent。当前基于模型的约束包括显式的不确定性惩罚和隐式的保守正则化，它们将分布外状态作用对的Q值向下推，而分布内状态作用对的Q值向上推。前者所依赖的不确定性估计可以针对复杂动力学进行松散校准，而后者的性能稍好。为了扩展正则化的基本思想而不进行不确定性量化，我们提出了分布式鲁棒离线基于模型的策略优化（DROMO），它利用分布稳健优化的思想，惩罚超出标准经验分布外Q值最小化范围的更广泛分布外状态动作对。我们从理论上证明，我们的方法优化了地面真相策略评估的下界，并且可以将其纳入任何现有的策略梯度算法中。我们还分析了DROMO的线性和非线性实例化的理论性质。
摘要：We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned model. Current model-based constraint includes explicit uncertainty penalty and implicit conservative regularization that pushes Q-values of out-of-distribution state-action pairs down and the in-distribution up. While the uncertainty estimation, on which the former relies on, can be loosely calibrated for complex dynamics, the latter performs slightly better. To extend the basic idea of regularization without uncertainty quantification, we propose distributionally robust offline model-based policy optimization (DROMO), which leverages the ideas in distributionally robust optimization to penalize a broader range of out-of-distribution state-action pairs beyond the standard empirical out-of-distribution Q-value minimization. We theoretically show that our method optimizes a lower bound on the ground-truth policy evaluation, and it can be incorporated into any existing policy gradient algorithms. We also analyze the theoretical properties of DROMO's linear and non-linear instantiations.

【2】 Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback
标题：奖励、政策和优势反馈下带资格追踪的人在环策略梯度算法的收敛性
链接：https://arxiv.org/abs/2109.07054

作者：Ishaan Shah,David Halpern,Kavosh Asadi,Michael L. Littman
机构：Equal contribution 1Department of Computer Science, BrownUniversity
备注：Accepted into ICML 2021 workshops Human-AI Collaboration in Sequential Decision-Making and Human in the Loop Learning
摘要：流动的人-代理通信对于人在回路强化学习的未来至关重要。代理必须对其训练师的反馈做出适当的响应，甚至在他们拥有丰富的合作经验之前。因此，学习代理对训练师提供的各种反馈方案做出良好的响应非常重要kely提供。这项工作分析了在三种不同类型的反馈政策反馈、奖励反馈和优势反馈下的收敛行为人-批评家（COACH）算法。对于这三种反馈类型，我们发现COACH可以表现为次优。我们提出了COACH的一种变体，情景式COACH（E-COACH）我们将COACH变量与其他两种强化学习算法：Q-learning和TAMER进行了比较。
摘要：Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent Actor-Critic by Humans (COACH) algorithm under three different types of feedback-policy feedback, reward feedback, and advantage feedback. For these three feedback types, we find that COACH can behave sub-optimally. We propose a variant of COACH, episodic COACH (E-COACH), which we prove converges for all three types. We compare our COACH variant with two other reinforcement-learning algorithms: Q-learning and TAMER.

【3】 Neural network optimal feedback control with enhanced closed loop stability
标题：闭环稳定性增强的神经网络最优反馈控制
链接：https://arxiv.org/abs/2109.07466

作者：Tenavi Nakamura-Zimmerer,Qi Gong,Wei Kang
摘要：最近的研究表明，监督学习是设计高维非线性动态系统最优反馈控制器的有效工具。但是这些神经网络（NN）控制器的行为仍然没有得到很好的理解。在本文中，我们使用数值模拟来证明典型的测试精度指标不能有效地捕获神经网络控制器稳定系统的能力。特别是，一些具有高测试精度的神经网络可能无法稳定动力学。为了解决这个问题，我们提出了两种局部近似线性二次调节器（LQR）的神经网络结构。数值模拟证实了我们的直觉，即所提出的结构在不牺牲性能的情况下可靠地产生稳定反馈控制器。此外，我们还介绍了描述此类神经网络控制系统稳定性的初步理论结果。
摘要：Recent research has shown that supervised learning can be an effective tool for designing optimal feedback controllers for high-dimensional nonlinear dynamic systems. But the behavior of these neural network (NN) controllers is still not well understood. In this paper we use numerical simulations to demonstrate that typical test accuracy metrics do not effectively capture the ability of an NN controller to stabilize a system. In particular, some NNs with high test accuracy can fail to stabilize the dynamics. To address this we propose two NN architectures which locally approximate a linear quadratic regulator (LQR). Numerical simulations confirm our intuition that the proposed architectures reliably produce stabilizing feedback controllers without sacrificing performance. In addition, we introduce a preliminary theoretical result describing some stability properties of such NN-controlled systems.

【4】 Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions
标题：数据学习与决策：最优公式与相变
链接：https://arxiv.org/abs/2109.06911

作者：M. Amine Bennouna,Bart P. G. Van Parys
机构：Bart P.G. Van Parys
摘要：我们研究了当只有历史数据可用时设计最优学习和决策公式的问题。以前的工作通常致力于特定类别的数据驱动公式，并随后尝试建立样本外性能保证。我们在这里采取相反的方法。我们首先定义一个合理的码棒，用它来衡量任何数据驱动公式的质量，然后寻求一个最优的公式。非正式地说，任何数据驱动的公式都可以被视为平衡估计成本与实际成本之间的接近度，同时保证样本外性能水平。给定一个可接受的样本外性能水平，我们明确构建了一个数据驱动的公式，该公式比任何其他具有相同样本外性能的公式都更接近真实成本。我们证明了存在三种不同的样本外性能状态（超指数状态、指数状态和亚指数状态），在这三种状态之间，最优数据驱动公式的性质经历了相变。最优数据驱动公式可以解释为超指数区域的经典稳健公式、指数区域的熵分布稳健公式以及次指数区域的方差惩罚公式。这最后的观察揭示了这三个乍一看似乎不相关的数据驱动公式之间令人惊讶的联系，而这三个公式直到现在仍然隐藏着。
摘要：We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.

预测|估计(2篇)

【1】 CAMul: Calibrated and Accurate Multi-view Time-Series Forecasting
标题：CAMul：校准准确的多视图时间序列预测
链接：https://arxiv.org/abs/2109.07438

作者：Harshavardhan Kamarthi,Lingkai Kong,Alexander Rodríguez,Chao Zhang,B. Aditya Prakash
机构：College of Computing, Georgia Institute of Technology
备注：16 pages, 4 figures
摘要：概率时间序列预测可以跨多个领域进行可靠的决策。大多数预测问题的数据来源多种多样，包含多种模式和结构。利用来自这些数据源的信息和不确定性进行校准准确的预测是一个重要的挑战性问题。以前关于多模态学习和预测的大多数工作只是通过简单的求和或连接方法从每个数据视图聚合中间表示，并且没有明确地为每个数据视图建模不确定性。我们提出了一个通用的概率多视图预测框架CAMul，它可以从不同的数据源中学习表示和不确定性。它以动态上下文特定的方式集成来自每个数据视图的知识和不确定性，赋予有用视图更大的重要性，以建模经过良好校准的预测分布。我们将CAMul用于具有不同来源和模式的多个领域，并表明CAMul在精度和校准方面优于其他最先进的概率预测模型25%。
摘要：Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information as well as uncertainty from these data sources for well-calibrated and accurate forecasts is an important challenging problem. Most previous work on multi-modal learning and forecasting simply aggregate intermediate representations from each data view by simple methods of summation or concatenation and do not explicitly model uncertainty for each data-view. We propose a general probabilistic multi-view forecasting framework CAMul, that can learn representations and uncertainty from diverse data sources. It integrates the knowledge and uncertainty from each data view in a dynamic context-specific manner assigning more importance to useful views to model a well-calibrated forecast distribution. We use CAMul for multiple domains with varied sources and modalities and show that CAMul outperforms other state-of-art probabilistic forecasting models by over 25\% in accuracy and calibration.

【2】 Multi View Spatial-Temporal Model for Travel Time Estimation
标题：行程时间估计的多视点时空模型
链接：https://arxiv.org/abs/2109.07402

作者：ZiChuan Liu,Zhaoyang Wu,Meng Wang
机构：Wuhan University of Technology, Wuhan, China, East China Normal University, Shanghai, China, Sun Yat-sen University, Guangzhou, China
摘要：出租车到达时间预测是构建智能交通系统的重要组成部分。传统的到达时间估计方法主要依赖于交通地图特征提取，不能对复杂情况和非线性时空关系建模。因此，我们提出了一种多视角时空模型（MVSTM）来捕捉时空和轨迹的相关性。具体来说，我们使用graph2vec对空间视图进行建模，双通道时间模块对轨迹视图进行建模，结构嵌入对交通语义进行建模。对大规模滑行轨迹数据的实验表明，该方法比新方法更有效。源代码可以从https://github.com/775269512/SIGSPATIAL-2021-GISCUP-4th-Solution.
摘要：Taxi arrival time prediction is an essential part of building intelligent transportation systems. Traditional arrival time estimation methods mainly rely on traffic map feature extraction, which can not model complex situations and nonlinear spatial and temporal relationships. Therefore, we propose a Multi-View Spatial-Temporal Model (MVSTM) to capture the dependence of spatial-temporal and trajectory. Specifically, we use graph2vec to model the spatial view, dual-channel temporal module to model the trajectory view, and structural embedding to model the traffic semantics. Experiments on large-scale taxi trajectory data show that our approach is more effective than the novel method. The source code can be obtained from https://github.com/775269512/SIGSPATIAL-2021-GISCUP-4th-Solution.

其他神经网络|深度学习|模型|建模(12篇)

【1】 Challenges in Detoxifying Language Models
标题：语言模型解毒面临的挑战
链接：https://arxiv.org/abs/2109.07445

作者：Johannes Welbl,Amelia Glaese,Jonathan Uesato,Sumanth Dathathri,John Mellor,Lisa Anne Hendricks,Kirsty Anderson,Pushmeet Kohli,Ben Coppin,Po-Sen Huang
机构：DeepMind
备注：23 pages, 6 figures, published in Findings of EMNLP 2021
摘要：大型语言模型（LM）生成非常流畅的文本，可以有效地适应NLP任务。从安全角度衡量和保证生成文本的质量对于在现实世界中部署LMs来说至关重要；为此，之前的工作通常依赖于LM毒性的自动评估。我们批判性地讨论了这一方法，评估了自动和人工评估方面的几种毒性缓解策略，并根据模型偏差和LM质量分析了毒性缓解的后果。我们证明，虽然基本干预策略可以有效地优化先前在RealOxicityPrompts数据集上建立的自动指标，但这是以降低边缘化群体文本和方言的LM覆盖率为代价的。此外，我们发现，在强烈的毒性降低干预措施后，人类评分员往往不同意高自动毒性评分——这进一步突出了仔细评估LM毒性所涉及的细微差别。
摘要：Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity. We critically discuss this approach, evaluate several toxicity mitigation strategies with respect to both automatic and human evaluation, and analyze consequences of toxicity mitigation in terms of model bias and LM quality. We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the RealToxicityPrompts dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions -- highlighting further the nuances involved in careful evaluation of LM toxicity.

【2】 Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel
标题：人们能听到神经网络的形状吗？：通过磁侧通道窥探GPU
链接：https://arxiv.org/abs/2109.07395

作者：Henrique Teles Maia,Chang Xiao,Dingzeyu Li,Eitan Grinspun,Changxi Zheng
机构：Adobe Research, Columbia University &, University of Toronto
备注：14 pages, accepted to USENIX Security 2022
摘要：神经网络应用在企业和个人环境中都变得很流行。网络解决方案针对每项任务都进行了精心调整，能够可靠地解决查询的设计最终会满足高需求。随着准确高效的机器学习模型的商业价值的增加，保护神经结构作为机密投资的需求也在增加。我们探讨了通过电磁侧通道在加速硬件上部署为黑匣子的神经网络的脆弱性。我们检查了由一个便宜的3美元感应传感器获得的图形处理单元电源线发出的磁通量，发现该信号泄露了黑箱神经网络模型的详细拓扑结构和超参数。该攻击为一个输入值未知但输入维度已知的查询获取磁信号。网络重建是可能的，因为模块化层序列中的深层神经网络进行评估。我们发现，每个层组件的评估产生了一个可识别的磁信号特征，从该特征中，可以使用经过适当训练的分类器和基于整数规划的联合一致性优化来推断层拓扑、宽度、函数类型和序列顺序。我们研究在何种程度上可以恢复网络规格，并考虑度量比较网络相似度。我们展示了这种边通道攻击在恢复包括随机设计在内的广泛网络体系结构细节方面的潜在准确性。我们考虑应用可能会利用这种新颖的侧通道曝光，例如对抗转移攻击。作为回应，我们讨论了防范我们的方法和其他类似窥探技术的对策。
摘要：Neural network applications have become popular in both enterprise and personal settings. Network solutions are tuned meticulously for each task, and designs that can robustly resolve queries end up in high demand. As the commercial value of accurate and performant machine learning models increases, so too does the demand to protect neural architectures as confidential investments. We explore the vulnerability of neural networks deployed as black boxes across accelerated hardware through electromagnetic side channels. We examine the magnetic flux emanating from a graphics processing unit's power cable, as acquired by a cheap $3 induction sensor, and find that this signal betrays the detailed topology and hyperparameters of a black-box neural network model. The attack acquires the magnetic signal for one query with unknown input values, but known input dimensions. The network reconstruction is possible due to the modular layer sequence in which deep neural networks are evaluated. We find that each layer component's evaluation produces an identifiable magnetic signal signature, from which layer topology, width, function type, and sequence order can be inferred using a suitably trained classifier and a joint consistency optimization based on integer programming. We study the extent to which network specifications can be recovered, and consider metrics for comparing network similarity. We demonstrate the potential accuracy of this side channel attack in recovering the details for a broad range of network architectures, including random designs. We consider applications that may exploit this novel side channel exposure, such as adversarial transfer attacks. In response, we discuss countermeasures to protect against our method and other similar snooping techniques.

【3】 Self-learn to Explain Siamese Networks Robustly
标题：自学强势解读暹罗网络
链接：https://arxiv.org/abs/2109.07371

作者：Chao Chen,Yifan Shen,Guixiang Ma,Xiangnan Kong,Srinivas Rangarajan,Xi Zhang,Sihong Xie
机构：∗Computer Science and Engineering Dept, Lehigh University †Laboratory of Trustworthy Distributed Computing and Service (MoE), BUPT, ‡University of
备注：Accepted to ICDM 2021
摘要：在数字取证、人脸识别和大脑网络分析等应用中，学习比较两个对象是必不可少的，特别是在标记数据稀少且不平衡的情况下。由于这些应用程序会做出高风险决策，并涉及公平和透明等社会价值观，因此解释所学模型至关重要。我们的目的是研究在学习比较中广泛使用的暹罗网络（SN）的事后解释。我们描述了由于SN中额外的比较对象而导致的基于梯度的解释的不稳定性，这与具有单个输入实例的体系结构不同。我们提出了一个优化框架，利用自学习从未标记数据中获得全局不变性，以提高为特定查询引用对定制的局部解释的稳定性。优化问题可以使用梯度下降上升法（GDA）求解约束优化问题，或使用SGD求解KL散度正则化无约束优化问题，并进行收敛性证明，尤其是当目标函数由于暹罗体系结构而非凸时。对神经科学和化学工程的表格和图形数据的定量结果和案例研究表明，该框架尊重自学习不变性，同时有力地优化了解释的忠实性和简单性。我们通过实验进一步证明了GDA的收敛性。
摘要：Learning to compare two objects are essential in applications, such as digital forensics, face recognition, and brain network analysis, especially when labeled data is scarce and imbalanced. As these applications make high-stake decisions and involve societal values like fairness and transparency, it is critical to explain the learned models. We aim to study post-hoc explanations of Siamese networks (SN) widely used in learning to compare. We characterize the instability of gradient-based explanations due to the additional compared object in SN, in contrast to architectures with a single input instance. We propose an optimization framework that derives global invariance from unlabeled data using self-learning to promote the stability of local explanations tailored for specific query-reference pairs. The optimization problems can be solved using gradient descent-ascent (GDA) for constrained optimization, or SGD for KL-divergence regularized unconstrained optimization, with convergence proofs, especially when the objective functions are nonconvex due to the Siamese architecture. Quantitative results and case studies on tabular and graph data from neuroscience and chemical engineering show that the framework respects the self-learned invariance while robustly optimizing the faithfulness and simplicity of the explanation. We further demonstrate the convergence of GDA experimentally.

【4】 Cross-lingual Transfer of Monolingual Models
标题：单语模型的跨语言迁移
链接：https://arxiv.org/abs/2109.07348

作者：Evangelia Gogoulou,Ariel Ekgren,Tim Isbister,Magnus Sahlgren
机构：RISE, KTH, Peltarion
摘要：最近使用多语言模型进行的零射程跨语言学习研究推翻了先前的假设，即共享词汇和联合预训练是跨语言概括的关键。受这一进展的启发，我们介绍了一种基于领域自适应的单语模型跨语言迁移方法。我们研究了从四种不同语言到英语的这种迁移的效果。我们在GLUE上的实验结果表明，无论源语言是什么，迁移模型都优于母语英语模型。通过考察迁移前后表征中的英语语言知识，我们发现语义信息保留在源语言中，而句法信息是在迁移过程中习得的。此外，对源语言任务中迁移模型的评估结果表明，迁移后，迁移模型在源域中的性能下降。
摘要：Recent studies in zero-shot cross-lingual learning using multilingual models have falsified the previous hypothesis that shared vocabulary and joint pre-training are the keys to cross-lingual generalization. Inspired by this advancement, we introduce a cross-lingual transfer method for monolingual models based on domain adaptation. We study the effects of such transfer from four different languages to English. Our experimental results on GLUE show that the transferred models outperform the native English model independently of the source language. After probing the English linguistic knowledge encoded in the representations before and after transfer, we find that semantic information is retained from the source language, while syntactic information is learned during transfer. Additionally, the results of evaluating the transferred models in source language tasks reveal that their performance in the source domain deteriorates after transfer.

【5】 PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching
标题：PoWareMatch：一种改进人类模式匹配的质量感知深度学习方法
链接：https://arxiv.org/abs/2109.07321

作者：Roee Shraga,Avigdor Gal
备注：Technical report of the paper {\sf PoWareMatch}: a Quality-aware Deep Learning Approach to Improve Human Schema Matching, accepted to ACM Journal of Data and Information Quality (JDIQ), Special Issue on Deep Learning for Data Quality
摘要：模式匹配是任何数据集成过程的核心任务。在数据库、人工智能、语义网和数据挖掘领域进行了多年的研究，主要的挑战仍然是在数据概念（如数据库属性）之间生成高质量匹配的能力。在这项工作中，我们从一个新的角度研究人类作为配对者的行为，将配对创造作为一个过程进行研究。我们从这个角度分析了常用评估指标（精确性、召回率和f-测度）的动态性，并强调了无偏匹配的必要性，以支持这一分析。无偏匹配，一个新定义的概念，描述了人类决策代表图式对应关系的可靠评估这一共同假设，然而，它不是人类匹配者的固有属性。接下来，我们设计了PoWareMatch，它利用深度学习机制来校准和过滤符合匹配质量的人类匹配决策，然后将这些决策与算法匹配相结合，以生成更好的匹配结果。我们提供了一个经验证据，该证据建立在200多名人类匹配者在共同基准上的实验基础上，PoWareMatch很好地预测了通过额外的对应关系扩展匹配的好处，并生成高质量的匹配。此外，PoWareMatch的性能优于最先进的匹配算法。
摘要：Schema matching is a core task of any data integration process. Being investigated in the fields of databases, AI, Semantic Web and data mining for many years, the main challenge remains the ability to generate quality matches among data concepts (e.g., database attributes). In this work, we examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We analyze the dynamics of common evaluation measures (precision, recall, and f-measure), with respect to this angle and highlight the need for unbiased matching to support this analysis. Unbiased matching, a newly defined concept that describes the common assumption that human decisions represent reliable assessments of schemata correspondences, is, however, not an inherent property of human matchers. In what follows, we design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions adhering the quality of a match, which are then combined with algorithmic matching to generate better match results. We provide an empirical evidence, established based on an experiment with more than 200 human matchers over common benchmarks, that PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high quality matches. In addition, PoWareMatch outperforms state-of-the-art matching algorithms.

【6】 End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs
标题：基于流程图的任务型对话的端到端学习
链接：https://arxiv.org/abs/2109.07263

作者：Dinesh Raghu,Shantanu Agarwal,Sachindra Joshi,Mausam
机构： Indian Institute of Technology, New Delhi, India, IBM Research, New Delhi, India
备注：D. Raghu and S.Agarwal contributed equally to this work
摘要：我们在面向任务的对话（TOD）的端到端学习中提出了一个新问题，其中对话系统模仿故障诊断代理，该代理通过诊断用户的问题（例如，汽车未启动）来帮助用户。此类对话框基于特定于域的流程图，代理应该在对话过程中遵循这些流程图。我们的任务为神经TOD带来了新的技术挑战，例如，在没有明确注释的情况下，根据流程图发言，在用户提出澄清问题时参考额外的手册页，以及在测试时遵循看不见的流程图的能力。我们发布了一个数据集（FloDial），它由基于12个不同故障排除流程图的2738个对话框组成。我们还设计了一个神经模型FloNet，它使用一种检索增强生成体系结构来训练对话代理。我们的实验发现，FloNet可以对看不见的流程图进行零炮传输，为未来的研究奠定了坚实的基础。
摘要：We propose a novel problem within end-to-end learning of task-oriented dialogs (TOD), in which the dialog system mimics a troubleshooting agent who helps a user by diagnosing their problem (e.g., car not starting). Such dialogs are grounded in domain-specific flowcharts, which the agent is supposed to follow during the conversation. Our task exposes novel technical challenges for neural TOD, such as grounding an utterance to the flowchart without explicit annotation, referring to additional manual pages when user asks a clarification question, and ability to follow unseen flowcharts at test time. We release a dataset (FloDial) consisting of 2,738 dialogs grounded on 12 different troubleshooting flowcharts. We also design a neural model, FloNet, which uses a retrieval-augmented generation architecture to train the dialog agent. Our experiments find that FloNet can do zero-shot transfer to unseen flowcharts, and sets a strong baseline for future research.

【7】 Integrating Sensing and Communication in Cellular Networks via NR Sidelink
标题：通过NR侧链路集成蜂窝网络中的传感和通信
链接：https://arxiv.org/abs/2109.07253

作者：Dariush Salami,Ramin Hasibi,Stefano Savazzi,Tom Michoel,Stephan Sigg
机构： Michoel are with the Department of Informatics at theUniversity of Bergen
备注：The paper is submitted to JSAC and it is still under review
摘要：射频传感是对接收到的电磁信号中的运动或环境感应模式的分析和解释，十多年来一直在积极研究。由于通过蜂窝通信系统的电磁信号无处不在，射频感应有可能成为一种通用的感应机制，应用于智能家居、零售、定位、手势识别、入侵检测等领域，现有的蜂窝网络装置可能同时用于通信和传感。这种通信和传感融合是未来通信网络的设想。我们建议使用NR侧链直接设备到设备通信，以在5G以上蜂窝通信系统中实现设备启动、灵活的感知能力。在本文中，我们专门研究了与基于侧链的射频传感相关的一个常见问题，即其角度和旋转依赖性。特别是，我们讨论了实现旋转不变性的毫米波点云数据的变换，以及在角度和距离不同的设备上基于这种旋转不变性输入的分布式处理。为了处理分布式数据，我们提出了一种基于图的编码器来捕获数据的时空特征，并提出了四种多角度学习方法。这些方法在一个新记录的公开数据集上进行比较，该数据集包括15名受试者，从8个角度记录21个手势。
摘要：RF-sensing, the analysis and interpretation of movement or environment-induced patterns in received electromagnetic signals, has been actively investigated for more than a decade. Since electromagnetic signals, through cellular communication systems, are omnipresent, RF sensing has the potential to become a universal sensing mechanism with applications in smart home, retail, localization, gesture recognition, intrusion detection, etc. Specifically, existing cellular network installations might be dual-used for both communication and sensing. Such communications and sensing convergence is envisioned for future communication networks. We propose the use of NR-sidelink direct device-to-device communication to achieve device-initiated,flexible sensing capabilities in beyond 5G cellular communication systems. In this article, we specifically investigate a common issue related to sidelink-based RF-sensing, which is its angle and rotation dependence. In particular, we discuss transformations of mmWave point-cloud data which achieve rotational invariance, as well as distributed processing based on such rotational invariant inputs, at angle and distance diverse devices. To process the distributed data, we propose a graph based encoder to capture spatio-temporal features of the data and propose four approaches for multi-angle learning. The approaches are compared on a newly recorded and openly available dataset comprising 15 subjects, performing 21 gestures which are recorded from 8 angles.

【8】 Learning Mathematical Properties of Integers
标题：学习整数的数学性质
链接：https://arxiv.org/abs/2109.07230

作者：Maria Ryskina,Kevin Knight
机构：Language Technologies Institute, Carnegie Mellon University, DiDi Labs
备注：BlackboxNLP 2021
摘要：在高维向量空间中嵌入单词在许多自然语言应用中被证明是有价值的。在这项工作中，我们研究了类似训练的整数嵌入是否能够捕获对数学应用有用的概念。我们探讨了数学知识的整数嵌入，并将其应用于一组数字推理任务，结果表明，通过学习数学序列数据的表示，我们可以大大改进从英语文本语料库中学习的数字嵌入。
摘要：Embedding words in high-dimensional vector spaces has proven valuable in many natural language applications. In this work, we investigate whether similarly-trained embeddings of integers can capture concepts that are useful for mathematical applications. We probe the integer embeddings for mathematical knowledge, apply them to a set of numerical reasoning tasks, and show that by learning the representations from mathematical sequence data, we can substantially improve over number embeddings learned from English text corpora.

【9】 Automatic Symmetry Discovery with Lie Algebra Convolutional Network
标题：基于李代数卷积网络的对称性自动发现
链接：https://arxiv.org/abs/2109.07103

作者：Nima Dehmamy,Robin Walters,Yanchen Liu,Dashun Wang,Rose Yu
机构：Northwestern University, Northeastern University, University of California San Diego
摘要：现有的连续群等变神经网络需要离散化或群表示。所有这些方法都需要关于群参数化的详细知识，并且不能学习全新的对称性。我们建议用李代数（无穷小生成器）代替李群。我们的模型，李代数卷积网络（L-conv）可以学习势对称性，并且不需要对群进行离散化。我们证明了L-conv可以作为构造任何群等变结构的构造块。我们讨论了CNN和图卷积网络如何与适当的群相关联并可以表示为L-conv。我们还推导了单个L-conv层的均方误差损失，并发现了与物理学中使用的拉格朗日数的深刻关系，其中一些物理学有助于定义损失景观中的泛化和对称性。相反，L-conv可用于为科学机器学习提出更一般的等变ans \“atze。
摘要：Existing equivariant neural networks for continuous groups require discretization or group representations. All these approaches require detailed knowledge of the group parametrization and cannot learn entirely new symmetries. We propose to work with the Lie algebra (infinitesimal generators) instead of the Lie group.Our model, the Lie algebra convolutional network (L-conv) can learn potential symmetries and does not require discretization of the group. We show that L-conv can serve as a building block to construct any group equivariant architecture. We discuss how CNNs and Graph Convolutional Networks are related to and can be expressed as L-conv with appropriate groups. We also derive the MSE loss for a single L-conv layer and find a deep relation with Lagrangians used in physics, with some of the physics aiding in defining generalization and symmetries in the loss landscape. Conversely, L-conv could be used to propose more general equivariant ans\"atze for scientific machine learning.

【10】 Building Accurate Simple Models with Multihop
标题：用多跳技术构建精确的简单模型
链接：https://arxiv.org/abs/2109.06961

作者：Amit Dhurandhar,Tejaswini Pedapati
摘要：在过去几年中，知识从复杂的高性能模型转移到简单且潜在的低性能模型，以提高其性能一直备受关注，因为它在重要问题中得到了应用，如可解释人工智能、模型压缩、，稳健的模型构建和小数据学习。解决此问题的已知方法（即知识提取、模型压缩、预加权等）通常通过修改目标或重新加权训练示例（简单模型在其上训练）的方案，将信息直接（即在单跳/一跳中）从复杂模型传输到所选简单模型。在本文中，我们提出了一种元方法，通过动态选择和/或构建一系列复杂度降低的中间模型，将信息从复杂模型传递到简单模型，这些中间模型比原始复杂模型复杂度更低。我们的方法可以使用前面提到的任何一种方法在序列中的连续模型之间传输信息，并以单跳方式工作，从而推广这些方法。在真实数据的实验中，我们观察到，我们在1-hop上获得了不同模型选择的一致增益，平均超过2%，在特定情况下达到8%。我们还实证分析了多跳方法可能比传统的单跳方法更有益的条件，并报告了其他有趣的见解。据我们所知，这是第一个在单一高性能复杂模型下提出这种多跳方法来进行知识转移的工作，我们认为这是一个重要的方法论贡献。
摘要：Knowledge transfer from a complex high performing model to a simpler and potentially low performing one in order to enhance its performance has been of great interest over the last few years as it finds applications in important problems such as explainable artificial intelligence, model compression, robust model building and learning from small data. Known approaches to this problem (viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically transfer information directly (i.e. in a single/one hop) from the complex model to the chosen simple model through schemes that modify the target or reweight training examples on which the simple model is trained. In this paper, we propose a meta-approach where we transfer information from the complex model to the simple model by dynamically selecting and/or constructing a sequence of intermediate models of decreasing complexity that are less intricate than the original complex model. Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches as well as work in 1-hop fashion, thus generalizing these approaches. In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop, which on average is more than 2\% and reaches up to 8\% in a particular case. We also empirically analyze conditions under which the multi-hop approach is likely to be beneficial over the traditional 1-hop approach, and report other interesting insights. To the best of our knowledge, this is the first work that proposes such a multi-hop approach to perform knowledge transfer given a single high performing complex model, making it in our opinion, an important methodological contribution.

【11】 Quantitative reconstruction of defects in multi-layered bonded composites using fully convolutional network-based ultrasonic inversion
标题：基于全卷积网络超声反演的多层粘结复合材料缺陷定量重建
链接：https://arxiv.org/abs/2109.07284

作者：Jing Rao,Fangshu Yang,Huadong Mo,Stefan Kollmannsberger,Ernst Rank
摘要：超声方法在检测和表征多层复合材料中的缺陷方面有着巨大的潜在应用。然而，定量重建影响粘接完整性并严重降低组件强度的缺陷（如脱粘和吻接键）仍然具有挑战性。在这项工作中，提出了一种基于监督完全卷积网络（FCN）的超声方法来定量重建多层粘结复合材料中隐藏的缺陷。在该方法的训练过程中，FCN建立了从测量的超声波数据到相应的多层粘结复合材料速度模型的非线性映射。在预测过程中，使用从训练过程中获得的训练网络，直接从新测量的粘接复合材料超声数据重建速度模型。提出的基于FCN的反演方法能够自动提取多层复合材料中的有用特征。虽然这种方法在训练过程中计算量很大，但在线阶段的预测本身只需几秒钟。数值结果表明，基于FCN的超声反演方法能够准确地重建高对比度缺陷的超声速度模型，在粘接复合材料的在线检测中具有很大的潜力。
摘要：Ultrasonic methods have great potential applications to detect and characterize defects in multi-layered bonded composites. However, it remains challenging to quantitatively reconstruct defects, such as disbonds and kissing bonds, that influence the integrity of adhesive bonds and seriously reduce the strength of assemblies. In this work, an ultrasonic method based on the supervised fully convolutional network (FCN) is proposed to quantitatively reconstruct defects hidden in multi-layered bonded composites. In the training process of this method, an FCN establishes a non-linear mapping from measured ultrasonic data to the corresponding velocity models of multi-layered bonded composites. In the predicting process, the trained network obtained from the training process is used to directly reconstruct the velocity models from the new measured ultrasonic data of adhesively bonded composites. The presented FCN-based inversion method can automatically extract useful features in multi-layered composites. Although this method is computationally expensive in the training process, the prediction itself in the online phase takes only seconds. The numerical results show that the FCN-based ultrasonic inversion method is capable to accurately reconstruct ultrasonic velocity models of the high contrast defects, which has great potential for online detection of adhesively bonded composites.

【12】 Non-linear Independent Dual System (NIDS) for Discretization-independent Surrogate Modeling over Complex Geometries
标题：复杂几何上离散化独立代理建模的非线性独立对偶系统(NIDS)
链接：https://arxiv.org/abs/2109.07018

作者：James Duvall,Karthik Duraisamy,Shaowu Pan
机构：University of Michigan, Ann Arbor, MI
摘要：偏微分方程（PDE）的数值解需要昂贵的模拟，限制了其在设计优化程序、基于模型的控制或大规模反问题求解中的应用。现有的基于卷积神经网络的代理模型框架需要有损像素化和数据预处理，不适合实际工程应用。因此，我们提出了非线性独立对偶系统（NIDS），它是一种深度学习的替代模型，用于PDE解的离散化独立连续表示，并可用于具有复杂、可变几何和网格拓扑的域上的预测。NIDS利用隐式神经表示法，通过在线性输出层中结合案例参数网络和点空间网络的评估，在问题参数和空间坐标之间建立非线性映射，以实现状态预测。空间网络的输入特征包括物理坐标，通过最小距离函数求值来增加，以隐式编码问题几何体。整体输出层的形式导致了一个双重系统，其中映射中的每个项都是非线性和独立的。此外，我们提出了一个最小距离函数驱动的加权和NIDS模型，该模型使用共享参数网络在一定的限制条件下通过构造来强制边界条件。该框架用于预测非参数定义网格上复杂的参数定义几何体的解，得到的解比全阶模型快很多数量级。测试案例包括一个具有复杂几何结构和数据稀缺性的车辆空气动力学问题，通过一种训练方法实现，在这种方法中，随着训练的进行，会逐渐增加更多的案例。
摘要：Numerical solution of partial differential equations (PDEs) require expensive simulations, limiting their application in design optimization routines, model-based control, or solution of large-scale inverse problems. Existing Convolutional Neural Network-based frameworks for surrogate modeling require lossy pixelization and data-preprocessing, which is not suitable for realistic engineering applications. Therefore, we propose non-linear independent dual system (NIDS), which is a deep learning surrogate model for discretization-independent, continuous representation of PDE solutions, and can be used for prediction over domains with complex, variable geometries and mesh topologies. NIDS leverages implicit neural representations to develop a non-linear mapping between problem parameters and spatial coordinates to state predictions by combining evaluations of a case-wise parameter network and a point-wise spatial network in a linear output layer. The input features of the spatial network include physical coordinates augmented by a minimum distance function evaluation to implicitly encode the problem geometry. The form of the overall output layer induces a dual system, where each term in the map is non-linear and independent. Further, we propose a minimum distance function-driven weighted sum of NIDS models using a shared parameter network to enforce boundary conditions by construction under certain restrictions. The framework is applied to predict solutions around complex, parametrically-defined geometries on non-parametrically-defined meshes with solution obtained many orders of magnitude faster than the full order models. Test cases include a vehicle aerodynamics problem with complex geometry and data scarcity, enabled by a training method in which more cases are gradually added as training progresses.

其他(24篇)

【1】 When Does Translation Require Context? A Data-driven, Multilingual Exploration
标题：翻译什么时候需要语境？数据驱动的多语言探索
链接：https://arxiv.org/abs/2109.07446

作者：Kayo Yin,Patrick Fernandes,André F. T. Martins,Graham Neubig
机构：Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Instituto Superior Técnico & LUMLIS (Lisbon ELLIS Unit), Lisbon, Portugal, Instituto de Telecomunicações, Lisbon, Portugal, Unbabel, Lisbon, Portugal
摘要：尽管正确处理语篇现象对机器翻译（MT）的质量有着重要的影响，但通用的翻译质量指标并不能充分反映这些现象。最近在上下文感知机器翻译方面的工作试图在评估过程中针对这些现象的一小部分。在本文中，我们提出了一个新的度量标准P-CXMI，它允许我们系统地识别需要上下文的翻译，确认先前研究的现象的困难，以及发现以前工作中未解决的新现象。然后，我们开发了多语言语篇感知（MuDA）基准，这是14种不同语言对中这些现象的一系列标记，我们用它来评估上下文感知机器翻译。我们发现，最先进的上下文感知机器翻译模型在我们的基准上比上下文无关的模型有轻微的改进，这表明当前的模型不能有效地处理这些模糊性。我们发布代码和数据，邀请机器翻译研究界加大力度，对目前被忽视的话语现象和语言进行语境感知翻译。
摘要：Although proper handling of discourse phenomena significantly contributes to the quality of machine translation (MT), common translation quality metrics do not adequately capture them. Recent works in context-aware MT attempt to target a small set of these phenomena during evaluation. In this paper, we propose a new metric, P-CXMI, which allows us to identify translations that require context systematically and confirm the difficulty of previously studied phenomena as well as uncover new ones that have not been addressed in previous work. We then develop the Multilingual Discourse-Aware (MuDA) benchmark, a series of taggers for these phenomena in 14 different language pairs, which we use to evaluate context-aware MT. We find that state-of-the-art context-aware MT models find marginal improvements over context-agnostic models on our benchmark, which suggests current models do not handle these ambiguities effectively. We release code and data to invite the MT research community to increase efforts on context-aware translation on discourse phenomena and languages that are currently overlooked.

【2】 Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative
标题：我们是不是应该做好训练前的准备工作？将结束任务意识训练作为替代方案的论点
链接：https://arxiv.org/abs/2109.07437

作者：Lucio M. Dery,Paul Michel,Ameet Talwalkar,Graham Neubig
机构： Carnegie Mellon University, ENS PSL University, Determined AI
备注：16 pages, 4 figures
摘要：预训练（Pre-training）是NLP中的主要范式，在对下游任务的数据进行微调之前，先对具有丰富数据的辅助目标进行模型训练。一般来说，预训练步骤几乎不依赖于对模型将进行微调的任务的直接了解，即使在提前知道最终任务的情况下也是如此。我们的工作挑战了结束任务不可知的预训练现状。首先，在来自两个领域的三种不同的低资源NLP任务上，我们证明，与Gururangan等人（2020年）广泛使用的任务不可知持续预训练范式相比，多任务完成最终任务和辅助目标可显著提高下游任务绩效。接下来，我们将介绍一种在线元学习算法，该算法学习一组多任务权重，以更好地平衡我们的多个辅助目标，从而进一步提高最终任务性能和数据效率。
摘要：Pre-training, where models are trained on an auxiliary objective with abundant data before being fine-tuned on data from the downstream task, is now the dominant paradigm in NLP. In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned, even when the end-task is known in advance. Our work challenges this status-quo of end-task agnostic pre-training. First, on three different low-resource NLP tasks from two domains, we demonstrate that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance than the widely-used task-agnostic continued pre-training paradigm of Gururangan et al. (2020). We next introduce an online meta-learning algorithm that learns a set of multi-task weights to better balance among our multiple auxiliary objectives, achieving further improvements on end task performance and data efficiency.

【3】 Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
标题：UNION：用于评估空间加速器张量操作的MLIR中统一的软硬件协同设计生态系统
链接：https://arxiv.org/abs/2109.07419

作者：Geonhwa Jeong,Gokcen Kestor,Prasanth Chatarasi,Angshuman Parashar,Po-An Tsai,Sivasankaran Rajamanickam,Roberto Gioiosa,Tushar Krishna
机构：Georgia Institute of Technology
备注：This paper is accepted to PACT 2021
摘要：为了满足商业和科学应用中深度学习的极端计算需求，数据流加速器正变得越来越流行。虽然这些“特定于域”的加速器不像CPU和GPU那样完全可编程，但它们在数据编排方面保持着不同程度的灵活性，即数据流和分片优化以提高效率。在设计新算法和映射方法以在新硬件上执行针对目标问题的算法时，存在一些挑战。以前的工作分别解决了这些挑战。为了从整体上解决这一挑战，在这项工作中，我们在流行的MLIR编译器基础设施中提出了一个称为Union的空间加速器硬件-软件协同设计生态系统。我们的框架允许探索不同的算法及其在几个加速器成本模型上的映射。Union还包括加速器成本模型和映射器的即插即用库，可轻松扩展。算法和加速器成本模型通过一个新的映射抽象连接起来，该抽象捕获了空间加速器的映射空间，可以根据硬件、工作负载和映射器的约束对其进行系统修剪。我们通过几个案例研究证明了Union对社区的价值，这些案例研究检查了使用不同映射方案在不同加速器体系结构上卸载不同的张量操作（CONV/GEMM/tensor Compression）。
摘要：To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these "domain-specific" accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are several challenges when designing new algorithms and mapping approaches to execute the algorithms for a target problem on new hardware. Previous works have addressed these challenges individually. To address this challenge as a whole, in this work, we present a HW-SW co-design ecosystem for spatial accelerators called Union within the popular MLIR compiler infrastructure. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. Union also includes a plug-and-play library of accelerator cost models and mappers which can easily be extended. The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper. We demonstrate the value of Union for the community with several case studies which examine offloading different tensor operations(CONV/GEMM/Tensor Contraction) on diverse accelerator architectures using different mapping schemes.

【4】 Modular Neural Ordinary Differential Equations
标题：模神经常微分方程
链接：https://arxiv.org/abs/2109.07359

作者：Max Zhu,Prof. P Lio,Jacob Moss
机构：University of Cambridge
备注：4 pages
摘要：几个世纪以来，物理学定律都是用微分方程的语言写成的。神经常微分方程（节点）是一种新的机器学习结构，它允许从数据集中学习这些微分方程。这些已经以拉格朗日神经网络（LNNs）和二阶神经微分方程（SONODEs）的形式应用于经典动力学模拟。然而，它们要么不能代表最一般的运动方程，要么缺乏可解释性。在本文中，我们提出了模块化神经网络ODE，其中每个力分量通过单独的模块进行学习。我们展示了物理先验如何容易地融入这些模型中。通过大量实验，我们证明了这些结果具有更好的性能，更易于解释，并且由于其模块化增加了灵活性。
摘要：The laws of physics have been written in the language of dif-ferential equations for centuries. Neural Ordinary Differen-tial Equations (NODEs) are a new machine learning architecture which allows these differential equations to be learned from a dataset. These have been applied to classical dynamics simulations in the form of Lagrangian Neural Net-works (LNNs) and Second Order Neural Differential Equations (SONODEs). However, they either cannot represent the most general equations of motion or lack interpretability. In this paper, we propose Modular Neural ODEs, where each force component is learned with separate modules. We show how physical priors can be easily incorporated into these models. Through a number of experiments, we demonstrate these result in better performance, are more interpretable, and add flexibility due to their modularity.

【5】 Comparing decision mining approaches with regard to the meaningfulness of their results
标题：比较决策挖掘方法结果的意义
链接：https://arxiv.org/abs/2109.07335

作者：Beate Scheibel,Stefanie Rinderle-Ma
机构：Research Group Workflow Systems and Technology, Computer Science, University of Vienna, Chair of Information Systems and Business Process Management, Department of Informatics, Technical University of Munich, Germany
摘要：决策和底层规则对于在运行时驱动流程执行是必不可少的，也就是说，基于流程数据的值在备选分支上路由流程实例。决策规则可以包括一元数据条件（例如年龄>40）、二元数据条件（其中两个或多个变量之间的关系是相关的），例如温度1摘要：Decisions and the underlying rules are indispensable for driving process execution during runtime, i.e., for routing process instances at alternative branches based on the values of process data. Decision rules can comprise unary data conditions, e.g., age > 40, binary data conditions where the relation between two or more variables is relevant, e.g. temperature1 < temperature2, and more complex conditions that refer to, for example, parts of a medical image. Decision discovery aims at automatically deriving decision rules from process event logs. Existing approaches focus on the discovery of unary, or in some instances binary data conditions. The discovered decision rules are usually evaluated using accuracy, but not with regards to their semantics and meaningfulness, although this is crucial for validation and the subsequent implementation/adaptation of the decision rules. Hence, this paper compares three decision mining approaches, i.e., two existing ones and one newly described approach, with respect to the meaningfulness of their results. For comparison, we use one synthetic data set for a realistic manufacturing case and the two real-world BPIC 2017/2020 logs. The discovered rules are discussed with regards to their semantics and meaningfulness.

【6】 Spline-PINN: Approaching PDEs without Data using Fast, Physics-Informed Hermite-Spline CNNs
标题：Spline-Pinn：用快速的物理信息Hermite-Spline CNN逼近无数据偏微分方程
链接：https://arxiv.org/abs/2109.07143

作者：Nils Wandel,Michael Weinmann,Michael Neidlin,Reinhard Klein
机构： University of Bonn, Delft University of Technology, RWTH Aachen
备注：Submitted to AAAI 2022
摘要：众所周知，偏微分方程（PDE）很难求解。一般来说，封闭形式的解决方案是不可用的，数值近似方案的计算成本很高。在本文中，我们提出了一种基于新技术的偏微分方程的解决方案，该技术结合了最近出现的两种基于机器学习的方法的优点。首先，物理信息神经网络（PINN）学习偏微分方程的连续解，并且可以在几乎没有基础真值数据的情况下进行训练。然而，pinn不能很好地推广到看不见的域。第二，卷积神经网络提供快速推理和泛化，但要么需要大量的训练数据，要么需要基于有限差分的物理约束损失，这可能导致不准确和离散化伪影。我们利用这两种方法的优点，通过使用Hermite样条核来连续插值可由CNN处理的基于网格的状态表示。这允许在不使用任何预先计算的训练数据的情况下仅使用物理信息损失函数进行训练，并提供快速、连续的解决方案，推广到看不见的领域。我们在不可压缩的Navier-Stokes方程和阻尼波动方程的例子中证明了我们方法的潜力。我们的模型能够学习一些有趣的现象，如卡门涡街、马格纳斯效应、多普勒效应、干扰模式和波反射。我们的定量评估和交互式实时演示表明，我们正在缩小基于无监督ML的方法与工业CFD解算器在精度上的差距，同时速度快了几个数量级。
摘要：Partial Differential Equations (PDEs) are notoriously difficult to solve. In general, closed-form solutions are not available and numerical approximation schemes are computationally expensive. In this paper, we propose to approach the solution of PDEs based on a novel technique that combines the advantages of two recently emerging machine learning based approaches. First, physics-informed neural networks (PINNs) learn continuous solutions of PDEs and can be trained with little to no ground truth data. However, PINNs do not generalize well to unseen domains. Second, convolutional neural networks provide fast inference and generalize but either require large amounts of training data or a physics-constrained loss based on finite differences that can lead to inaccuracies and discretization artifacts. We leverage the advantages of both of these approaches by using Hermite spline kernels in order to continuously interpolate a grid-based state representation that can be handled by a CNN. This allows for training without any precomputed training data using a physics-informed loss function only and provides fast, continuous solutions that generalize to unseen domains. We demonstrate the potential of our method at the examples of the incompressible Navier-Stokes equation and the damped wave equation. Our models are able to learn several intriguing phenomena such as Karman vortex streets, the Magnus effect, Doppler effect, interference patterns and wave reflections. Our quantitative assessment and an interactive real-time demo show that we are narrowing the gap in accuracy of unsupervised ML based methods to industrial CFD solvers while being orders of magnitude faster.

【7】 Parallel Constraint-Driven Inductive Logic Programming
标题：并行约束驱动的归纳逻辑程序设计
链接：https://arxiv.org/abs/2109.07132

作者：Andrew Cropper,Oghenejokpeme Orhobor,Cristian Dinu,Rolf Morel
备注：Paper under review
摘要：多核机器无处不在。然而，大多数归纳逻辑编程（ILP）方法只使用一个内核，这严重限制了它们的可扩展性。为了解决这一局限性，我们引入了基于约束驱动ILP的并行技术，其目标是累积约束以限制假设空间。我们在两个领域（程序合成和归纳一般游戏）上的实验表明，（i）并行化可以显著减少学习时间，（ii）工作人员通信（即共享约束）对于良好的性能非常重要。
摘要：Multi-core machines are ubiquitous. However, most inductive logic programming (ILP) approaches use only a single core, which severely limits their scalability. To address this limitation, we introduce parallel techniques based on constraint-driven ILP where the goal is to accumulate constraints to restrict the hypothesis space. Our experiments on two domains (program synthesis and inductive general game playing) show that (i) parallelisation can substantially reduce learning times, and (ii) worker communication (i.e. sharing constraints) is important for good performance.

【8】 What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation
标题：用户想要什么？分层对话策略优化的信息增益
链接：https://arxiv.org/abs/2109.07129

作者：Christian Geishauser,Songbo Hu,Hsien-chin Lin,Nurul Lubis,Michael Heck,Shutong Feng,Carel van Niekerk,Milica Gašić
机构： Gaˇsi´c 1 1Heinrich Heine University D¨usseldorf, Germany 2Department of Computer Science and Technology, University of Cambridge
摘要：面向任务的对话系统的对话管理组件通常通过强化学习（RL）进行优化。通过RL进行优化非常容易受到样本效率低下和不稳定性的影响。被称为封建对话管理的分层方法通过分解行动空间朝着更有效的学习迈出了一步。然而，由于只有在对话结束时才提供奖励，它仍然受到不稳定的影响。我们建议使用基于信息增益的内在奖励来解决这个问题。我们建议的奖励有利于解决不确定性或在必要时询问用户的行为。它使策略能够学习如何有效地检索用户的需求，这是每个面向任务的对话中不可或缺的一个方面。我们的算法，我们称之为FeudalGain，在PyDial框架的大多数环境中实现了最先进的结果，优于更复杂的方法。通过仿真实验和人体实验，验证了算法的样本效率和稳定性。
摘要：The dialogue management component of a task-oriented dialogue system is typically optimised via reinforcement learning (RL). Optimisation via RL is highly susceptible to sample inefficiency and instability. The hierarchical approach called Feudal Dialogue Management takes a step towards more efficient learning by decomposing the action space. However, it still suffers from instability due to the reward only being provided at the end of the dialogue. We propose the usage of an intrinsic reward based on information gain to address this issue. Our proposed reward favours actions that resolve uncertainty or query the user whenever necessary. It enables the policy to learn how to retrieve the users' needs efficiently, which is an integral aspect in every task-oriented conversation. Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework, outperforming much more complex approaches. We confirm the sample efficiency and stability of our algorithm through experiments in simulation and a human trial.

【9】 F-CAM: Full Resolution CAM via Guided Parametric Upscaling
标题：F-CAM：基于导引参数放大的全分辨率CAM
链接：https://arxiv.org/abs/2109.07069

作者：Soufiane Belharbi,Aydin Sarraf,Marco Pedersoli,Ismail Ben Ayed,Luke McCaffrey,Eric Granger
机构： LIVIA, Dept. of Systems Engineering, École de technologie supérieure, Montreal, Canada, Goodman Cancer Research Centre, Dept. of Oncology, McGill University, Montreal, Canada, Ericsson, Montreal, Canada
备注：23pages, under review
摘要：类激活映射（CAM）方法在弱监督目标定位（WSOL）中得到了广泛的关注任务，允许CNN可视化和解释，无需对完全注释的图像数据集进行训练。CAM方法通常集成在现成的CNN主干内，如ResNet50。由于卷积和降采样/池操作，这些主干产生低分辨率CAM，降尺度因子高达32，使accurate定位更加困难。恢复全尺寸CAM需要插值，但不考虑对象的统计特性，导致边界不一致和定位不准确的激活。作为替代方案，我们介绍了一种通用方法，用于参数化放大CAM，以允许构造accurate全分辨率CAM（F-CAM）。特别是，我们提出了一种可训练的解码架构，可连接到任何CNN分类器，以产生更精确的CAM。给定原始（低分辨率）对CAM、前景和背景像素进行随机采样，以便对解码器进行微调。此外，还考虑了图像统计和大小约束等其他先验条件，以扩展和细化对象边界。在CUB-200-2011和OpenImages数据集上使用三个CNN主干和六个WSOL基线进行的大量实验表明，我们的-CAM方法显著提高了CAM定位精度。F-CAM的性能与最先进的WSOL方法相比具有竞争力，但在推理过程中需要较少的计算资源。
摘要：Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks, allowing for CNN visualization and interpretation without training on fully annotated image datasets. CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50. Due to convolution and downsampling/pooling operations, these backbones yield low resolution CAMs with a down-scaling factor of up to 32, making accurate localization more difficult. Interpolation is required to restore a full size CAMs, but without considering the statistical properties of the objects, leading to activations with inconsistent boundaries and inaccurate localizations. As an alternative, we introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs (F-CAMs). In particular, we propose a trainable decoding architecture that can be connected to any CNN classifier to produce more accurate CAMs. Given an original (low resolution) CAM, foreground and background pixels are randomly sampled for fine-tuning the decoder. Additional priors such as image statistics, and size constraints are also considered to expand and refine object boundaries. Extensive experiments using three CNN backbones and six WSOL baselines on the CUB-200-2011 and OpenImages datasets, indicate that our F-CAM method yields a significant improvement in CAM localization accuracy. F-CAM performance is competitive with state-of-art WSOL methods, yet it requires fewer computational resources during inference.

【10】 Self-Training with Differentiable Teacher
标题：差异化教师的自我训练
链接：https://arxiv.org/abs/2109.07049

作者：Simiao Zuo,Yue Yu,Chen Liang,Haoming Jiang,Siawpeng Er,Chao Zhang,Tuo Zhao,Hongyuan Zha
机构：⋄Georgia Institute of Technology, □Amazon, †The Chinese University of Hong Kong, Shenzhen
摘要：自我训练在各种半监督和弱监督学习任务中取得了巨大成功。该方法可以解释为师生框架，教师生成伪标签，学生进行预测。这两个模型交替更新。然而，这种简单的交替更新规则会导致训练不稳定。这是因为教师的微小变化可能导致学生的重大变化。为了解决这个问题，我们提出{\ours}，可微自我训练的缩写，将师生视为Stackelberg游戏。在这个游戏中，领导者总是处于比追随者更有利的位置。在自我训练中，学生对预测成绩做出贡献，教师通过生成伪标签来控制训练过程。因此，我们将学生视为领导者，教师视为追随者。领导者通过承认追随者的策略来获得优势，包括可微伪标签和可微样本权重。因此，可以通过Stackelberg梯度（通过区分跟随者的策略获得）有效地捕获领导者-跟随者的交互作用。在半监督和弱监督分类以及命名实体识别任务上的实验结果表明，我们的模型大大优于现有的方法。
摘要：Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks. The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions. The two models are updated alternatingly. However, such a straightforward alternating update rule leads to training instability. This is because a small change in the teacher may result in a significant change in the student. To address this issue, we propose {\ours}, short for differentiable self-training, that treats teacher-student as a Stackelberg game. In this game, a leader is always in a more advantageous position than a follower. In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels. Therefore, we treat the student as the leader and the teacher as the follower. The leader procures its advantage by acknowledging the follower's strategy, which involves differentiable pseudo-labels and differentiable sample weights. Consequently, the leader-follower interaction can be effectively captured via Stackelberg gradient, obtained by differentiating the follower's strategy. Experimental results on semi- and weakly-supervised classification and named entity recognition tasks show that our model outperforms existing approaches by large margins.

【11】 Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG
标题：注意力确实是你所需要的：数据到文本NLG的语义注意力引导解码
链接：https://arxiv.org/abs/2109.07043

作者：Juraj Juraska,Marilyn Walker
机构：Natural Language and Dialogue Systems Lab, University of California, Santa Cruz
备注：Accepted to INLG 2021
摘要：自从在数据到文本语言生成中采用神经模型以来，它们总是依赖外部组件来提高语义准确性，因为这些模型通常无法生成可靠地提及输入中提供的所有信息的文本。在本文中，我们提出了一种新的解码方法，该方法从编码器-解码器模型的交叉注意中提取可解释的信息，并使用它来推断生成的文本中提到了哪些属性，随后用于重新跟踪光束假设。使用T5和BART的这种解码方法，我们在三个数据集上展示了它能够显著减少生成输出中的语义错误，同时保持其最先进的质量。
摘要：Ever since neural models were adopted in data-to-text language generation, they have invariably been reliant on extrinsic components to improve their semantic accuracy, because the models normally do not exhibit the ability to generate text that reliably mentions all of the information provided in the input. In this paper, we propose a novel decoding method that extracts interpretable information from encoder-decoder models' cross-attention, and uses it to infer which attributes are mentioned in the generated text, which is subsequently used to rescore beam hypotheses. Using this decoding method with T5 and BART, we show on three datasets its ability to dramatically reduce semantic errors in the generated outputs, while maintaining their state-of-the-art quality.

【12】 Avengers Ensemble! Improving Transferability of Authorship Obfuscation
标题：复仇者联盟！提高署名模糊的可转移性
链接：https://arxiv.org/abs/2109.07028

作者：Muhammad Haroon,Muhammad Fareed Zaffar,Padmini Srinivasan,Zubair Shafiq
机构：Avengers Ensemble! Improving Transferability, of Authorship Obfuscation
备注：Submitted to PETS 2021
摘要：茎秆测量方法已被证明是相当有效的真实世界的作者归属。为了减轻作者身份归属所带来的隐私威胁，研究人员提出了自动作者身份混淆方法，旨在隐藏透露匿名文档作者身份的笔迹人工制品。最近的工作集中在作者身份混淆方法上，这种方法依赖于对属性分类器的黑盒访问来规避属性，同时保留语义。然而，为了在现实威胁模型下发挥作用，这些模糊处理方法必须能够很好地工作，即使对手的属性分类器与模糊处理者内部使用的属性分类器不同。不幸的是，现有的作者混淆方法不能很好地转移到看不见的属性分类器。在本文中，我们提出了一种基于集成的可转移作者身份混淆方法。我们的实验表明，如果混淆器能够避开基于多个基本属性分类器的集成属性分类器，则更可能转移到不同的属性分类器。我们的分析表明，基于集合的作者身份混淆实现了更好的可转移性，因为它通过基本上平均每个基本属性分类器的决策边界来组合来自每个基本属性分类器的知识。
摘要：Stylometric approaches have been shown to be quite effective for real-world authorship attribution. To mitigate the privacy threat posed by authorship attribution, researchers have proposed automated authorship obfuscation approaches that aim to conceal the stylometric artefacts that give away the identity of an anonymous document's author. Recent work has focused on authorship obfuscation approaches that rely on black-box access to an attribution classifier to evade attribution while preserving semantics. However, to be useful under a realistic threat model, it is important that these obfuscation approaches work well even when the adversary's attribution classifier is different from the one used internally by the obfuscator. Unfortunately, existing authorship obfuscation approaches do not transfer well to unseen attribution classifiers. In this paper, we propose an ensemble-based approach for transferable authorship obfuscation. Our experiments show that if an obfuscator can evade an ensemble attribution classifier, which is based on multiple base attribution classifiers, it is more likely to transfer to different attribution classifiers. Our analysis shows that ensemble-based authorship obfuscation achieves better transferability because it combines the knowledge from each of the base attribution classifiers by essentially averaging their decision boundaries.

【13】 Embedding Node Structural Role Identity Using Stress Majorization
标题：利用应力优化嵌入节点结构角色标识
链接：https://arxiv.org/abs/2109.07023

作者：Lili Wang,Chenghan Huang,Weicheng Ma,Ying Lu,Soroush Vosoughi
机构：Dartmouth College, Hanover, New Hampshire, USA, Millennium Management, LLC, New York, New York, USA, Stony Brook Univeristy, Stony Brook, New York, USA
备注：In CIKM 2021
摘要：网络中的节点可能具有一个或多个功能，用于确定其在系统中的角色。与本地邻近性（捕获节点的本地上下文）相反，角色标识捕获节点在网络中扮演的功能“角色”，例如作为组的中心或两个组之间的桥梁。这意味着网络中相距遥远的节点可以具有类似的结构角色标识。最近的几项工作探索了在网络中嵌入节点角色的方法。然而，这些方法都依赖于结构等效的近似或间接建模。在本文中，我们提出了一种新颖灵活的框架，利用应力优化将网络中的高维角色身份直接（无需近似或间接建模）转换为低维嵌入空间。我们的方法也很灵活，因为它不依赖于特定的结构相似性定义。我们使用三个真实世界和五个合成网络，评估了我们在节点分类、聚类和可视化任务上的方法。我们的实验表明，我们的框架在学习节点角色表示方面取得了优于现有方法的效果。
摘要：Nodes in networks may have one or more functions that determine their role in the system. As opposed to local proximity, which captures the local context of nodes, the role identity captures the functional "role" that nodes play in a network, such as being the center of a group, or the bridge between two groups. This means that nodes far apart in a network can have similar structural role identities. Several recent works have explored methods for embedding the roles of nodes in networks. However, these methods all rely on either approximating or indirect modeling of structural equivalence. In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space. Our method is also flexible, in that it does not rely on specific structural similarity definitions. We evaluated our method on the tasks of node classification, clustering, and visualization, using three real-world and five synthetic networks. Our experiments show that our framework achieves superior results than existing methods in learning node role representations.

【14】 Behavior of k-NN as an Instance-Based Explanation Method
标题：k-NN作为一种基于实例的解释方法的行为
链接：https://arxiv.org/abs/2109.06999

作者：Chhavi Yadav,Kamalika Chaudhuri
机构：UC San Diego
摘要：在关键领域采用DL模型导致对声音解释方法的需求不断增加。基于实例的解释方法是一种流行的类型，它从训练集中返回选择性实例来解释测试样本的预测。将这些解释与预测联系起来的一种方法是提出以下反事实问题——当解释从训练集中移除时，测试样本的损失和预测如何变化？我们的论文为k-NNs回答了这个问题，k-NNs是基于实例的解释方法的天然竞争者。我们首先从经验上证明，由神经网络最后一层产生的表示空间是执行k-NN的最佳空间。利用这一层，我们进行了实验，并将它们与影响函数（IFs）~\cite{koh2017understanding}进行了比较，后者试图回答一个类似的问题。我们的评估确实表明了当解释被删除时损失和预测的变化，但我们没有发现$k$和损失或预测变化之间的趋势。我们发现MNIST与CIFAR-10的预测和损失具有显著的稳定性。令人惊讶的是，在这个问题上，我们没有观察到k-NNs和IFs在行为上有多大差异。我们将此归因于IFs的训练集二次抽样。
摘要：Adoption of DL models in critical areas has led to an escalating demand for sound explanation methods. Instance-based explanation methods are a popular type that return selective instances from the training set to explain the predictions for a test sample. One way to connect these explanations with prediction is to ask the following counterfactual question - how does the loss and prediction for a test sample change when explanations are removed from the training set? Our paper answers this question for k-NNs which are natural contenders for an instance-based explanation method. We first demonstrate empirically that the representation space induced by last layer of a neural network is the best to perform k-NN in. Using this layer, we conduct our experiments and compare them to influence functions (IFs) ~\cite{koh2017understanding} which try to answer a similar question. Our evaluations do indicate change in loss and predictions when explanations are removed but we do not find a trend between $k$ and loss or prediction change. We find significant stability in the predictions and loss of MNIST vs. CIFAR-10. Surprisingly, we do not observe much difference in the behavior of k-NNs vs. IFs on this question. We attribute this to training set subsampling for IFs.

【15】 A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence
标题：用于获取物联网相关网络威胁情报的清晰、社交和黑暗网络的爬虫架构
链接：https://arxiv.org/abs/2109.06932

作者：Paris Koloveas,Thanasis Chantzios,Christos Tryfonopoulos,Spiros Skiadopoulos
机构：University of the Peloponnese, GR, Tripolis, Greece
备注：None
摘要：清晰、社交和黑暗的网络最近被确定为宝贵网络安全信息的丰富来源，如果有适当的工具和方法，这些信息可能会被识别、爬网并随后被用于可采取行动的网络威胁情报。在这项工作中，我们专注于信息收集任务，并提出了一种新的爬行体系结构，用于透明地从clear web中的安全网站、社交web中的安全论坛以及黑暗web中的黑客论坛/市场中获取数据。提出的体系结构采用了两阶段的数据收集方法。最初，使用基于机器学习的爬虫将收获指向感兴趣的网站，而在第二阶段，使用最先进的统计语言建模技术将收获的信息表示在潜在的低维特征空间中，并根据其与手头任务的潜在相关性对其进行排序。所提议的架构是使用专门的开源工具实现的，众包结果的初步评估证明了其有效性。
摘要：The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.

【16】 Disentangling Generative Factors of Physical Fields Using Variational Autoencoders
标题：用变分自动编码器解开物理场的生成因子
链接：https://arxiv.org/abs/2109.07399

作者：Christian Jacobsen,Karthik Duraisamy
机构：University of Michigan, Ann Arbor, MI
摘要：在计算物理中，以无监督的方式从高维数据场中提取生成参数的能力是一个非常理想但尚未实现的目标。这项工作探索了使用变分自动编码器（VAE）进行非线性降维，目的是分离低维潜在变量，以识别生成数据的独立物理参数。解纠缠分解是可解释的，可以转移到各种任务，包括生成建模、设计优化和概率降阶建模。这项工作的一个主要重点是使用VAE描述解纠缠，同时最小限度地修改经典的VAE损失函数（即ELBO），以保持高重建精度。解纠缠对潜在空间的旋转、超参数、随机初始化和学习计划高度敏感。损失景观的特点是过度正则化的局部极小值，围绕着理想的解决方案。我们通过将学习到的潜在分布与模型多孔流动问题中的“真实”生成因子并列，说明了解纠缠和纠缠表示之间的比较。与经典的VAE相比，实现层次优先（HP）可以更好地促进解纠缠表示的学习。先验分布的选择对解纠缠有显著的影响。特别是，当使用旋转不变先验进行训练时，正则化损失不受潜在旋转的影响，因此学习非旋转不变先验大大有助于捕获生成因子的特性，改进解纠缠。对训练VAE固有的一些问题，如收敛到过度正则化的局部极小值进行了说明和研究，并提出了潜在的缓解技术。
摘要：The ability to extract generative parameters from high-dimensional fields of data in an unsupervised manner is a highly desirable yet unrealized goal in computational physics. This work explores the use of variational autoencoders (VAEs) for non-linear dimension reduction with the aim of disentangling the low-dimensional latent variables to identify independent physical parameters that generated the data. A disentangled decomposition is interpretable and can be transferred to a variety of tasks including generative modeling, design optimization, and probabilistic reduced order modelling. A major emphasis of this work is to characterize disentanglement using VAEs while minimally modifying the classic VAE loss function (i.e. the ELBO) to maintain high reconstruction accuracy. Disentanglement is shown to be highly sensitive to rotations of the latent space, hyperparameters, random initializations and the learning schedule. The loss landscape is characterized by over-regularized local minima which surrounds desirable solutions. We illustrate comparisons between disentangled and entangled representations by juxtaposing learned latent distributions and the 'true' generative factors in a model porous flow problem. Implementing hierarchical priors (HP) is shown to better facilitate the learning of disentangled representations over the classic VAE. The choice of the prior distribution is shown to have a dramatic effect on disentanglement. In particular, the regularization loss is unaffected by latent rotation when training with rotationally-invariant priors, and thus learning non-rotationally-invariant priors aids greatly in capturing the properties of generative factors, improving disentanglement. Some issues inherent to training VAEs, such as the convergence to over-regularized local minima are illustrated and investigated, and potential techniques for mitigation are presented.

【17】 How to use KL-divergence to construct conjugate priors, with well-defined non-informative limits, for the multivariate Gaussian
标题：如何利用KL散度构造具有定义明确的非信息性极限的多元高斯共轭先验
链接：https://arxiv.org/abs/2109.07384

作者：Niko Brümmer
备注：10 pages
摘要：当均值已知时，Wishart分布是多元高斯似然精度的标准共轭先验，而当均值未知时，可以使用正态Wishart分布。然而，如何为这些分布的超参数赋值并不那么明显。特别是，当形成这些分布的非信息性限制时，必须小心处理Wishart的形状（或自由度）参数。一些作者提出的将形状直接解释为伪计数并使其归零的直观解决方案违反了形状参数的限制。我们展示了如何使用多元高斯函数之间的标度KL散度作为能量函数来构造Wishart和正规Wishart共轭先验。当用作信息先验时，这些分布的显著特征是模式，而KL比例因子用作伪计数。可将比例因子降至零极限，以形成不违反Wishart形状参数限制的非信息性先验。在后验模态与高斯似然参数的最大似然估计相同的意义上，该限制是非信息性的。
摘要：The Wishart distribution is the standard conjugate prior for the precision of the multivariate Gaussian likelihood, when the mean is known -- while the normal-Wishart can be used when the mean is also unknown. It is however not so obvious how to assign values to the hyperparameters of these distributions. In particular, when forming non-informative limits of these distributions, the shape (or degrees of freedom) parameter of the Wishart must be handled with care. The intuitive solution of directly interpreting the shape as a pseudocount and letting it go to zero, as proposed by some authors, violates the restrictions on the shape parameter. We show how to use the scaled KL-divergence between multivariate Gaussians as an energy function to construct Wishart and normal-Wishart conjugate priors. When used as informative priors, the salient feature of these distributions is the mode, while the KL scaling factor serves as the pseudocount. The scale factor can be taken down to the limit at zero, to form non-informative priors that do not violate the restrictions on the Wishart shape parameter. This limit is non-informative in the sense that the posterior mode is identical to the maximum likelihood estimate of the Gaussian likelihood parameters.

【18】 Distribution-free Contextual Dynamic Pricing
标题：免分销上下文动态定价
链接：https://arxiv.org/abs/2109.07340

作者：Yiyun Luo,Will Wei Sun,and Yufeng Liu
机构： Krannert School of Management, Purdue University, Department of Statistics and Operations Research
摘要：上下文动态定价旨在根据与客户的顺序交互设定个性化价格。在每个时间段，对购买产品感兴趣的客户都会来到平台。客户对产品的评价是一个线性函数，包括产品和客户特征，以及一些随机的市场噪音。卖方不观察客户的真实估价，而是需要利用上下文信息和历史二进制购买反馈来了解估价。现有模型通常假定对随机噪声分布有全部或部分了解。在本文中，我们考虑上下文动态定价与未知随机噪声的估值模型。我们的免费分销定价政策同时学习了上下文函数和市场噪声。我们的方法的一个关键组成部分是一个新的扰动线性bandit框架，其中提出了一个改进的线性置信上限算法，以平衡市场噪声的探索和利用现有知识进行更好的定价。我们在扰动线性bandit框架下建立了策略的后悔上界和匹配下界，并在所考虑的定价问题中证明了一个次线性后悔界。最后，我们在模拟和真实的汽车贷款数据集上展示了我们的策略的优越性能。
摘要：Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer's valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer's true valuation, but instead needs to learn the valuation by leveraging contextual information and historical binary purchase feedbacks. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, where a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sub-linear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto-loan dataset.

【19】 DeFungi: Direct Mycological Examination of Microscopic Fungi Images
标题：脱菌：真菌显微图像的直接真菌检查
链接：https://arxiv.org/abs/2109.07322

作者：Camilo Javier Pineda Sopo,Farshid Hajati,Soheila Gheisari
机构：College of Engineering and Science, Victoria University Sydney, Sydney, NSW , Australia
摘要：传统上，人类真菌感染的诊断和治疗在很大程度上依赖于被称为真菌学家的专业实验室科学家进行的面对面咨询或检查。在许多情况下，例如最近在新冠病毒-19大流行中传播的毛霉菌病，可通过显微镜直接检查活检或样本，在真菌学诊断过程的早期阶段向患者安全地建议初始治疗。使用深度学习模型的计算机辅助诊断系统已被训练并用于后期真菌学诊断阶段。然而，没有为早期阶段制作的参考文献作品。哥伦比亚的一个真菌学实验室捐赠了用于开展这项研究工作的图像。他们被手动分为五类，并在主题专家的协助下进行策划。这些图像后来被裁剪并用自动代码例程修补，以生成最终的数据集。本文介绍了使用两种不同的深度学习方法和三种不同的卷积神经网络模型（VGG16、Inception V3和ResNet50）对五种真菌类型进行分类的实验结果。第一种方法对从头开始训练的模型的分类性能进行基准测试，而第二种方法使用基于ImageNet数据集的预训练模型对分类性能进行基准测试。在5类数据集上使用k-fold交叉验证测试，从零开始训练的性能最好的模型是Inception V3，报告准确率为73.2%。此外，使用迁移学习的最佳模型是VGG16，报告率为85.04%。这两种方法提供的统计数据创建了一个初始参考点，以鼓励未来的研究工作提高分类性能。此外，构建的数据集在Kaggle和GitHub上发布，以促进未来的研究。
摘要：Traditionally, diagnosis and treatment of fungal infections in humans depend heavily on face-to-face consultations or examinations made by specialized laboratory scientists known as mycologists. In many cases, such as the recent mucormycosis spread in the COVID-19 pandemic, an initial treatment can be safely suggested to the patient during the earliest stage of the mycological diagnostic process by performing a direct examination of biopsies or samples through a microscope. Computer-aided diagnosis systems using deep learning models have been trained and used for the late mycological diagnostic stages. However, there are no reference literature works made for the early stages. A mycological laboratory in Colombia donated the images used for the development of this research work. They were manually labelled into five classes and curated with a subject matter expert assistance. The images were later cropped and patched with automated code routines to produce the final dataset. This paper presents experimental results classifying five fungi types using two different deep learning approaches and three different convolutional neural network models, VGG16, Inception V3, and ResNet50. The first approach benchmarks the classification performance for the models trained from scratch, while the second approach benchmarks the classification performance using pre-trained models based on the ImageNet dataset. Using k-fold cross-validation testing on the 5-class dataset, the best performing model trained from scratch was Inception V3, reporting 73.2% accuracy. Also, the best performing model using transfer learning was VGG16 reporting 85.04%. The statistics provided by the two approaches create an initial point of reference to encourage future research works to improve classification performance. Furthermore, the dataset built is published in Kaggle and GitHub to foster future research.

【20】 Testing Self-Organized Criticality Across the Main Sequence using Stellar Flares from TESS
标题：利用TESS的恒星耀斑测试跨越主序的自组织临界性
链接：https://arxiv.org/abs/2109.07011

作者：Adina D. Feinstein,Darryl Z. Seligman,Maximilian N. Günther,Fred C. Adams
机构：Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL , USA, Department of the Geophysical Sciences, University of Chicago, Chicago, IL , USA, Maximilian N. G¨unther
备注：6 pages, 3 figures, Submitted to journal
摘要：恒星产生爆炸性耀斑，据信是由日冕磁场结构中储存的能量释放驱动的。研究表明，太阳耀斑具有自组织临界系统的典型能量分布。这项研究将一种新的耀斑探测技术应用于美国宇航局TESS任务获得的数据，并确定了跨光谱类型的$\sim10^5$恒星上的$\sim10^6$耀斑事件。我们的结果表明，维持磁场拓扑结构处于自组织临界状态的磁重联事件在恒星日冕中普遍存在。
摘要：Stars produce explosive flares, which are believed to be powered by the release of energy stored in coronal magnetic field configurations. It has been shown that solar flares exhibit energy distributions typical of self-organized critical systems. This study applies a novel flare detection technique to data obtained by NASA's TESS mission and identifies $\sim10^6$ flaring events on $\sim10^5$ stars across spectral types. Our results suggest that magnetic reconnection events that maintain the topology of the magnetic field in a self-organized critical state are ubiquitous among stellar coronae.

【21】 Scalable Average Consensus with Compressed Communications
标题：可扩展的平均共识，支持压缩通信
链接：https://arxiv.org/abs/2109.06996

作者：Mohammad Taha Toghani,César A. Uribe
机构： several averageThe authors are with the Department of Electrical and Computer En-gineering, Rice University
摘要：我们提出了一种新的具有压缩通信的分散平均一致性算法，该算法与网络大小n成线性关系。我们证明了当允许代理与压缩消息通信时，所提出的方法收敛于网络代理本地持有的初始值的平均值。所提出的算法适用于一类广泛的压缩算子（可能有偏差），其中代理在任意静态、无向和连接的网络上进行交互。我们进一步给出了数值实验，证实了我们的理论结果，并说明了我们算法的可扩展性和通信效率。
摘要：We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size n. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possibly biased), where agents interact over arbitrary static, undirected, and connected networks. We further present numerical experiments that confirm our theoretical results and illustrate the scalability and communication efficiency of our algorithm.

【22】 Combining GEDI and Sentinel-2 for wall-to-wall mapping of tall and short crops
标题：GEDI和Sentinel-2结合用于高矮作物的壁间作图
链接：https://arxiv.org/abs/2109.06972

作者：Stefania Di Tommaso,Sherrie Wang,David B. Lobell
机构： Department of Earth System Science and Center on Food Security and the, Environment, Stanford University, Institute for Computational and Mathematical Engineering, Stanford University, Goldman School of Public Policy, University of California, Berkeley
摘要：高分辨率作物类型地图是改善粮食安全的一个重要工具，在拥有地面实况标签用于模型训练的地区，遥感越来越多地被用于绘制此类地图。然而，许多地区都没有这些标签，在其他地区就典型卫星特征（如光学传感器的特征）训练的模型在传输时往往表现出低性能。在这里，我们探讨了使用美国宇航局全球生态系统动力学调查（GEDI）的星载激光雷达仪器，结合Sentinel-2光学数据，进行作物类型制图。利用来自三个主要种植地区（中国、法国和美国）的数据，我们首先证明GEDI能源模式能够可靠地将高度通常超过2米的玉米与较矮的水稻和大豆等作物区分开来。我们进一步表明，与被动光学传感器检测到的光谱和物候特征相比，这些GEDI剖面在地理上提供了更多不变的特征。GEDI能够将玉米与每个区域内的其他作物区分开来，准确度高于84%，并且能够跨区域传输，准确度高于82%，而光学特征传输的准确度为64%。最后，我们表明，GEDI剖面可用于根据Sentinel-2的光学图像为模型生成训练标签，从而能够在标签稀缺区域创建10米高作物与短作物的墙到墙地图。由于玉米是世界上种植面积第二大的作物，而且通常是景观中唯一的高大作物，因此我们得出结论，GEDI为改进全球作物类型图提供了巨大的希望。
摘要：High resolution crop type maps are an important tool for improving food security, and remote sensing is increasingly used to create such maps in regions that possess ground truth labels for model training. However, these labels are absent in many regions, and models trained in other regions on typical satellite features, such as those from optical sensors, often exhibit low performance when transferred. Here we explore the use of NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, combined with Sentinel-2 optical data, for crop type mapping. Using data from three major cropped regions (in China, France, and the United States) we first demonstrate that GEDI energy profiles are capable of reliably distinguishing maize, a crop typically above 2m in height, from crops like rice and soybean that are shorter. We further show that these GEDI profiles provide much more invariant features across geographies compared to spectral and phenological features detected by passive optical sensors. GEDI is able to distinguish maize from other crops within each region with accuracies higher than 84%, and able to transfer across regions with accuracies higher than 82% compared to 64% for transfer of optical features. Finally, we show that GEDI profiles can be used to generate training labels for models based on optical imagery from Sentinel-2, thereby enabling the creation of 10m wall-to-wall maps of tall versus short crops in label-scarce regions. As maize is the second most widely grown crop in the world and often the only tall crop grown within a landscape, we conclude that GEDI offers great promise for improving global crop type maps.

【23】 Targeted Cross-Validation
标题：定向交叉验证
链接：https://arxiv.org/abs/2109.06949

作者：Jiawei Zhang,Jie Ding,Yuhong Yang
机构：School of Statistics, University of Minnesota, and
摘要：在许多应用中，我们可以访问完整的数据集，但只对预测变量的特定区域感兴趣。标准方法是从一组候选方法中找到全局最佳建模方法。然而，在现实中，可能很少有一种候选方法一致优于其他方法。这种情况下的一种自然方法是在绩效评估中应用加权$L_2$损失，以反映区域特定利益。我们提出了一种目标交叉验证（TCV），以基于一般加权的$L_2$损失选择模型或程序。我们表明，TCV在选择加权$L_2$损失下表现最佳的候选人时是一致的。实验研究用于证明TCV的使用及其相对于全局CV或仅使用局部数据建模局部区域的方法的潜在优势。以往关于CV的调查都依赖于这样一个条件：当样本量足够大时，两名候选人的排名保持不变。然而，在许多设置了不断变化的数据生成过程或高度自适应建模方法的应用中，随着样本量的变化，这些方法的相对性能并不是静态的。即使使用固定的数据生成过程，两种方法的排名也可能无限多次切换。在这项工作中，我们扩大了选择一致性的概念，允许最佳候选人随着样本量的变化而切换，然后建立TCV的一致性。这种灵活的框架可以应用于高维和复杂的机器学习场景，其中建模过程的相对性能是动态的。
摘要：In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

【24】 Reconstruction on Trees and Low-Degree Polynomials
标题：树与低次多项式的重构
链接：https://arxiv.org/abs/2109.06915

作者：Frederic Koehler,Elchanan Mossel
机构：Simons Institute, Part of this work was completed while at the Department of Mathematics, †Department of Mathematics and IDSS, Massachusetts Institute of Technology
备注：20 pages, comments welcome
摘要：马尔可夫过程和树上广播的研究与许多领域有着深刻的联系，包括统计物理、系统发育重建、MCMC算法和随机图中的社区检测。值得注意的是，著名的信念传播（BP）算法在重建问题上实现了Bayes最优性能，即从树的叶值预测树根处的马尔可夫过程值。最近，低阶多项式的分析已经成为预测计算到统计差距的一个有价值的工具。在这项工作中，我们研究了低次多项式在树上重建问题中的性能。也许令人惊讶的是，我们发现存在具有$N$叶的简单树模型，其中（1）根值的非平凡重构可以通过简单的多项式时间算法实现，并且对噪声具有鲁棒性，但对于$c>0$一个常数，不可以使用任何$N^{c}$次多项式，以及（2）当树未知且给定多个具有相关根分配的样本时，可以使用简单、抗噪且计算效率高的SQ（统计查询）算法对根值进行非平凡的重构，但不能使用任何次数为$N^c$的多项式。这些结果澄清了用于贝叶斯估计问题的低次多项式与多项式时间算法的一些局限性。他们还补充了莫伊特拉、莫塞尔和桑登最近研究信念传播电路复杂性的工作。我们提出了关于低阶多项式和Kesten-Stigum阈值的相关开放性问题。
摘要：The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, phylogenetic reconstruction, MCMC algorithms, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the Markov process at the root of the tree from its values at the leaves. Recently, the analysis of low-degree polynomials has emerged as a valuable tool for predicting computational-to-statistical gaps. In this work, we investigate the performance of low-degree polynomials for the reconstruction problem on trees. Perhaps surprisingly, we show that there are simple tree models with $N$ leaves where (1) nontrivial reconstruction of the root value is possible with a simple polynomial time algorithm and with robustness to noise, but not with any polynomial of degree $N^{c}$ for $c > 0$ a constant, and (2) when the tree is unknown and given multiple samples with correlated root assignments, nontrivial reconstruction of the root value is possible with a simple, noise-robust, and computationally efficient SQ (Statistical Query) algorithm but not with any polynomial of degree $N^c$. These results clarify some of the limitations of low-degree polynomials vs. polynomial time algorithms for Bayesian estimation problems. They also complement recent work of Moitra, Mossel, and Sandon who studied the circuit complexity of Belief Propagation. We pose related open questions about low-degree polynomials and the Kesten-Stigum threshold.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递