机器学习学术速递[12.2]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计117篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 A Graph Neural Networks based Framework for Topology-Aware Proactive SLA Management in a Latency Critical NFV Application Use-case
标题：延迟关键NFV应用用例中基于图神经网络的拓扑感知主动SLA管理框架
链接：https://arxiv.org/abs/2212.00714

作者：Nikita Jalodia,Mohit Taneja,Alan Davy
机构：DAVY, (Senior Member, IEEE), Walton Institute for Information and Communication Systems Science, South East Technological University, Waterford, X,WR, Ireland
摘要：随着5G和6 G部署的最新进展，通过支持网络功能虚拟化（NFV）的灵活软件化通信网络范例，出现了一系列新的延迟关键型应用。电信、智能电网、虚拟现实（VR）、工业4.0、自动化车辆等不断发展的垂直行业都受到低延迟和高可靠性愿景的驱动，而在有效弥补服务提供商和最终用户的服务质量（QoS）限制方面存在巨大差距。在本文中，我们提出了一个主动的SLA管理框架，利用图神经网络（GNN）和深度强化学习（DRL）来平衡效率和可靠性之间的权衡，从而解决延迟关键服务的过度供应问题。总结我们的主要贡献：1）在多输出场景下，构建了一个基于图的多时间步时空多变量时间序列预测模型，在该用例下，与已建立的基线模型相比，性能提高了74.62%;以及2）我们利用用例的实际SLA定义来实现动态SLA感知监督，以利用DRL扩展策略管理。
摘要：Recent advancements in the rollout of 5G and 6G have led to the emergence of a new range of latency-critical applications delivered via a Network Function Virtualization (NFV) enabled paradigm of flexible and softwarized communication networks. Evolving verticals like telecommunications, smart grid, virtual reality (VR), industry 4.0, automated vehicles, etc. are driven by the vision of low latency and high reliability, and there is a wide gap to efficiently bridge the Quality of Service (QoS) constraints for both the service providers and the end-user. In this work, we look to tackle the over-provisioning of latency-critical services by proposing a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability. To summarize our key contributions: 1) we compose a graph-based spatio-temporal multivariate time-series forecasting model with multiple time-step predictions in a multi-output scenario, delivering 74.62% improved performance over the established baseline state-of-art model on the use-case; and 2) we leverage realistic SLA definitions for the use-case to achieve a dynamic SLA-aware oversight for scaling policy management with DRL.

【2】 Graph Anomaly Detection via Multi-Scale Contrastive Learning Networks with Augmented View
标题：基于增广视点的多尺度对比学习网络图异常检测
链接：https://arxiv.org/abs/2212.00535

作者：Jingcan Duan,Siwei Wang,Pei Zhang,En Zhu,Jingtao Hu,Hu Jin,Yue Liu,Zhibin Dong
机构：College of Computer, National University of Defense Technology, Changsha, China
备注：None
摘要：图异常检测是基于图的机器学习中的一个重要任务，在实际应用中有着广泛的应用。GAD的主要目标是从图数据集中捕获明显偏离大多数节点的异常节点。最近的研究关注了GAD的各种对比策略，节点-子图和节点-节点对比。然而，它们忽略了子图-子图比较信息，即正常子图对和异常子图对在GAD中的嵌入和结构上的不同表现，导致任务性能次优.在本文中，我们在提出的多视角多尺度对比学习框架中实现了上述思想，并首次进行了子图-子图对比的实践。具体地说，我们把原始输入图作为第一视图，并通过对图进行边缘修改来生成第二视图。在最大化子图对相似度的指导下，提出的子图-子图对比度有助于在结构变化的情况下提高子图嵌入的鲁棒性.此外，引入的子图-子图对比与广泛采用的节点-子图对比和节点-节点对比能够很好地协同工作，共同提升GAD性能。此外，我们还通过实验研究了不同的图扩展方法对检测性能的影响.综合实验结果表明，与现有方法相比，该方法具有较好的优越性，同时证明了多视图子图对对比策略在GAD任务中的有效性.
摘要：Graph anomaly detection (GAD) is a vital task in graph-based machine learning and has been widely applied in many real-world applications. The primary goal of GAD is to capture anomalous nodes from graph datasets, which evidently deviate from the majority of nodes. Recent methods have paid attention to various scales of contrastive strategies for GAD, i.e., node-subgraph and node-node contrasts. However, they neglect the subgraph-subgraph comparison information which the normal and abnormal subgraph pairs behave differently in terms of embeddings and structures in GAD, resulting in sub-optimal task performance. In this paper, we fulfill the above idea in the proposed multi-view multi-scale contrastive learning framework with subgraph-subgraph contrast for the first practice. To be specific, we regard the original input graph as the first view and generate the second view by graph augmentation with edge modifications. With the guidance of maximizing the similarity of the subgraph pairs, the proposed subgraph-subgraph contrast contributes to more robust subgraph embeddings despite of the structure variation. Moreover, the introduced subgraph-subgraph contrast cooperates well with the widely-adopted node-subgraph and node-node contrastive counterparts for mutual GAD performance promotions. Besides, we also conduct sufficient experiments to investigate the impact of different graph augmentation approaches on detection performance. The comprehensive experimental results well demonstrate the superiority of our method compared with the state-of-the-art approaches and the effectiveness of the multi-view subgraph pair contrastive strategy for the GAD task.

【3】 GrannGAN: Graph annotation generative adversarial networks
标题：GrannGan：图标注产生式对抗网络
链接：https://arxiv.org/abs/2212.00449

作者：Yoann Boget,Magda Gregorova,Alexandros Kalousis
机构：University of Geneva and Geneva School for Business administration HES-SO, Rue de la Tambourine , Carouge, Switzerland, Center for Artificial Intelligence and Robotics (CAIRO), FHWS, Franz-Horn-Strasse , W¨urzburg-Schweinfurt, Germany
备注：Published as Journal Track paper ACML 2022
摘要：我们考虑建模高维分布和生成具有与图骨架一致的复杂关系特征结构的数据的新示例的问题。该模型通过将数据特征生成任务分为两个阶段，解决了由数据点的特定图结构所约束的数据特征生成问题。在第一种方法中，它对与给定图的节点相关联的特征的分布进行建模，在第二种方法中，它在节点特征上有条件地补充边特征。我们遵循隐式分布建模的策略，通过生成对抗网络（GAN）结合在节点和边的集合上操作的置换等变消息传递架构。这使得能够一次（在2个阶段中）生成所有图对象的特征向量，这与慢得多的顺序模型的一个接一个的生成相反，防止了对基于似然的生成模型通常所需的昂贵的图匹配过程的需要，并且通过对图表示中的特定节点排序不敏感而有效地使用网络容量。据我们所知，这是第一种对沿着图骨架的特征分布建模的方法，允许生成具有用户指定结构的注释图。我们的实验证明了我们的模型通过在三个标注图数据集上的定量评估来学习复杂结构分布的能力。
摘要：We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton. The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases. In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features. We follow the strategy of implicit distribution modelling via generative adversarial network (GAN) combined with permutation equivariant message passing architecture operating over the sets of nodes and edges. This enables generating the feature vectors of all the graph objects in one go (in 2 phases) as opposed to a much slower one-by-one generations of sequential models, prevents the need for expensive graph matching procedures usually needed for likelihood-based generative models, and uses efficiently the network capacity by being insensitive to the particular node ordering in the graph representation. To the best of our knowledge, this is the first method that models the feature distribution along the graph skeleton allowing for generations of annotated graphs with user specified structures. Our experiments demonstrate the ability of our model to learn complex structured distributions through quantitative evaluation over three annotated graph datasets.

【4】 Component Segmentation of Engineering Drawings Using Graph Convolutional Networks
标题：基于图卷积网络的工程图分量分割
链接：https://arxiv.org/abs/2212.00290

作者：Wentai Zhang,Joe Joseph,Yue Yin,Liuyue Xie,Tomotake Furuhata,Soji Yamakawa,Kenji Shimada,Levent Burak Kara
机构：Kara∗, Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
备注：Preprint submitted to Computers in Industry
摘要：None
摘要：We present a data-driven framework to automate the vectorization and machine interpretation of 2D engineering part drawings. In industrial settings, most manufacturing engineers still rely on manual reads to identify the topological and manufacturing requirements from drawings submitted by designers. The interpretation process is laborious and time-consuming, which severely inhibits the efficiency of part quotation and manufacturing tasks. While recent advances in image-based computer vision methods have demonstrated great potential in interpreting natural images through semantic segmentation approaches, the application of such methods in parsing engineering technical drawings into semantically accurate components remains a significant challenge. The severe pixel sparsity in engineering drawings also restricts the effective featurization of image-based data-driven methods. To overcome these challenges, we propose a deep learning based framework that predicts the semantic type of each vectorized component. Taking a raster image as input, we vectorize all components through thinning, stroke tracing, and cubic bezier fitting. Then a graph of such components is generated based on the connectivity between the components. Finally, a graph convolutional neural network is trained on this graph data to identify the semantic type of each component. We test our framework in the context of semantic segmentation of text, dimension and, contour components in engineering drawings. Results show that our method yields the best performance compared to recent image, and graph-based segmentation methods.

【5】 Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation
标题：基于多层次数据增强的半监督异构图学习
链接：https://arxiv.org/abs/2212.00024

作者：Ying Chen,Siwei Qiang,Mingming Ha,Xiaolei Liu,Shaoshuai Li,Lingfeng Yuan,Xiaobo Guo,Zhenfeng Zhu
机构： School of Automation and Electrical Engineering, University of Science and TechnologyBeijing; MYbank
摘要：近年来，半监督图学习结合数据扩充（DA）是当前最常用、性能最好的方法，用于在稀疏场景下提高模型的鲁棒性。与同构图不同，异构图中的DA具有更大的挑战：信息的异构性要求DA策略能够有效地处理异构关系，它考虑了不同类型的邻居和边对目标节点的信息贡献。此外，复杂图的非均匀分布和强聚类形成的负曲率会导致信息的过度压缩。针对这些问题，提出了一种基于多级数据扩充的半监督异构图学习方法（HG-MDA）.针对数据挖掘中信息的异构性问题，针对异构图的特点，提出了节点和拓扑扩充策略。并将基于元关系的关注度作为选择增强节点和边的指标之一。针对信息过度压缩的问题，设计了基于三角形的边添加和边去除算法，以减轻负曲率，带来拓扑增益。最后，损失函数由标记数据的交叉熵损失和未标记数据的一致性正则化组成。为了有效地融合各种DA策略的预测结果，采用了锐化的方法。现有的关于公共数据集的实验，ACM、DBLP、OGB和行业数据集MB显示HG-MDA优于当前SOTA模型。此外，HG-MDA应用于互联网金融场景的用户识别，帮助业务新增30%的重点用户，贷款和余额分别增长3.6%、11.1%和9.8%。
摘要：In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.

【6】 Graph Convolutional Neural Networks as Parametric CoKleisli morphisms
标题：参数CoKleisli态射的图卷积神经网络
链接：https://arxiv.org/abs/2212.00542

作者：Bruno Gavranović,Mattia Villani
备注：21 pages
摘要：对于任意有$n$个节点的图，我们定义了图卷积神经网络$\mathbf{GCNN}_n$的双范畴.我们展示了它可以通过已经存在的深度学习范畴构造来分解，称为$\mathbf{Para}$和$\mathbf{Lens}$，基本范畴设置为产品comonad的CoKleisli范畴。本文证明了存在一个内射在对象上的忠实2-函子我们证明了这种构造允许我们将GCNN的邻接矩阵作为全局参数而不是局部的、逐层的参数来处理。这给了我们一个高层次的分类特征的一种特殊类型的电感偏置GCNNs拥有。最后，我们假设GCNNs可能推广到一般的消息传递图神经网络，与等变学习的联系，以及激活函数的功能性（缺乏）。
摘要：We define the bicategory of Graph Convolutional Neural Networks $\mathbf{GCNN}_n$ for an arbitrary graph with $n$ nodes. We show it can be factored through the already existing categorical constructions for deep learning called $\mathbf{Para}$ and $\mathbf{Lens}$ with the base category set to the CoKleisli category of the product comonad. We prove that there exists an injective-on-objects, faithful 2-functor $\mathbf{GCNN}_n \to \mathbf{Para}(\mathsf{CoKl}(\mathbb{R}^{n \times n} \times -))$. We show that this construction allows us to treat the adjacency matrix of a GCNN as a global parameter instead of a a local, layer-wise one. This gives us a high-level categorical characterisation of a particular kind of inductive bias GCNNs possess. Lastly, we hypothesize about possible generalisations of GCNNs to general message-passing graph neural networks, connections to equivariant learning, and the (lack of) functoriality of activation functions.

Transformer(2篇)

【1】 Purifier: Defending Data Inference Attacks via Transforming Confidence Scores
标题：净化器：通过变换置信度分数防御数据推理攻击
链接：https://arxiv.org/abs/2212.00612

作者：Ziqi Yang,Lijin Wang,Da Yang,Jie Wan,Ziming Zhao,Ee-Chien Chang,Fan Zhang,Kui Ren
机构： ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Key Laboratory of Blockchain and Cyberspace Governance of Zhejiang Province, Jiaxing Research Institute, Zhejiang University
备注：accepted by AAAI 2023
摘要：神经网络容易受到数据推理攻击，如隶属度推理攻击、对抗模型反演攻击和属性推理攻击，攻击者可以从目标分类器预测的置信度中推断出数据样本的隶属度、重构或敏感属性等有用信息。本文提出了一种防御成员关系推理攻击的方法PURIFIER。该算法对目标分类器预测的置信度向量进行变换，使得纯化后的置信度在个体形状、统计分布和预测标签上无法区分成员和非成员。实验结果表明，PURIFIER能够有效地防御成员关系推理攻击，且效用损失可以忽略不计。此外，进一步的实验表明，PURIFIER在防御对抗性模型反转攻击和属性推理攻击方面也是有效的。例如，在Facescrub530分类器上，反演误差增加了大约4倍以上，而在我们的实验中部署PURIFIER时，属性推断准确率显著下降。
摘要：Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.

【2】 Transformer-based Hand Gesture Recognition via High-Density EMG Signals: From Instantaneous Recognition to Fusion of Motor Unit Spike Trains
标题：基于高密度肌电信号的Transformer式手势识别：从瞬时识别到电机单位棘波序列融合
链接：https://arxiv.org/abs/2212.00743

作者：Mansooreh Montazerin,Elahe Rahimian,Farnoosh Naderkhani,S. Farokh Atashzar,Svetlana Yanushkevich,Arash Mohammadi
机构：Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada, Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada
摘要：None
摘要：Designing efficient and labor-saving prosthetic hands requires powerful hand gesture recognition algorithms that can achieve high accuracy with limited complexity and latency. In this context, the paper proposes a compact deep learning framework referred to as the CT-HGR, which employs a vision transformer network to conduct hand gesture recognition using highdensity sEMG (HD-sEMG) signals. The attention mechanism in the proposed model identifies similarities among different data segments with a greater capacity for parallel computations and addresses the memory limitation problems while dealing with inputs of large sequence lengths. CT-HGR can be trained from scratch without any need for transfer learning and can simultaneously extract both temporal and spatial features of HD-sEMG data. Additionally, the CT-HGR framework can perform instantaneous recognition using sEMG image spatially composed from HD-sEMG signals. A variant of the CT-HGR is also designed to incorporate microscopic neural drive information in the form of Motor Unit Spike Trains (MUSTs) extracted from HD-sEMG signals using Blind Source Separation (BSS). This variant is combined with its baseline version via a hybrid architecture to evaluate potentials of fusing macroscopic and microscopic neural drive information. The utilized HD-sEMG dataset involves 128 electrodes that collect the signals related to 65 isometric hand gestures of 20 subjects. The proposed CT-HGR framework is applied to 31.25, 62.5, 125, 250 ms window sizes of the above-mentioned dataset utilizing 32, 64, 128 electrode channels. The average accuracy over all the participants using 32 electrodes and a window size of 31.25 ms is 86.23%, which gradually increases till reaching 91.98% for 128 electrodes and a window size of 250 ms. The CT-HGR achieves accuracy of 89.13% for instantaneous recognition based on a single frame of HD-sEMG image.

GAN|对抗|攻击|生成相关(8篇)

【1】 Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
标题：Score-Jacobian链：提升预先训练的2D扩散模型以生成3D
链接：https://arxiv.org/abs/2212.00774

作者：Haochen Wang,Xiaodan Du,Jiahao Li,Raymond A. Yeh,Greg Shakhnarovich
机构：TTI-Chicago, Purdue University, A zoomed out high-quality photo of Temple of Heaven, A high quality photo of a delicious burger, A high quality photo of a Victorian style wooden chair, with velvet upholstery, A high quality photo of a classic silver muscle car
备注：project page this https URL
摘要：扩散模型学习预测梯度的向量场。我们建议在学习的梯度上应用链式规则，并且通过可微分渲染器的雅可比矩阵反向传播扩散模型的得分，我们将其实例化为体素辐射场。该设置将多个摄像机视点处的2D分数聚合成3D分数，并重新使用预训练的2D模型来生成3D数据。我们发现了在该应用中出现的分布失配的技术挑战，并提出了一种新的估计机制来解决它。我们在几个现成的扩散图像生成模型上运行我们的算法，包括最近发布的在大规模LAION数据集上训练的稳定扩散。
摘要：A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D data generation. We identify a technical challenge of distribution mismatch that arises in this application, and propose a novel estimation mechanism to resolve it. We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.

【2】 Adversarial Artifact Detection in EEG-Based Brain-Computer Interfaces
标题：基于脑电的脑机接口对抗性伪影检测
链接：https://arxiv.org/abs/2212.00727

作者：Xiaoqing Chen,Dongrui Wu
机构：School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
摘要：机器学习在基于脑电图的脑机接口中取得了巨大的成功。现有的脑机接口研究大多集中在提高其准确性上，而很少考虑其安全性。然而，最近的研究表明，基于EEG的BCI容易受到对抗性攻击，其中添加到输入的小扰动可能导致错误分类。对抗性例子的检测对于理解这一现象和辩护都是至关重要的。本文首次探讨了基于EEG的脑机接口对抗性检测。在两个脑电信号数据集上，使用三种卷积神经网络对多种检测方法的性能进行了实验验证。我们证明了白盒攻击和黑盒攻击都可以被检测到，并且前者更容易被检测到。
摘要：Machine learning has achieved great success in electroencephalogram (EEG) based brain-computer interfaces (BCIs). Most existing BCI research focused on improving its accuracy, but few had considered its security. Recent studies, however, have shown that EEG-based BCIs are vulnerable to adversarial attacks, where small perturbations added to the input can cause misclassification. Detection of adversarial examples is crucial to both the understanding of this phenomenon and the defense. This paper, for the first time, explores adversarial detection in EEG-based BCIs. Experiments on two EEG datasets using three convolutional neural networks were performed to verify the performances of multiple detection approaches. We showed that both white-box and black-box attacks can be detected, and the former are easier to detect.

【3】 All You Need Is Hashing: Defending Against Data Reconstruction Attack in Vertical Federated Learning
标题：您所需要的就是散列：在垂直联合学习中防御数据重建攻击
链接：https://arxiv.org/abs/2212.00325

作者：Pengyu Qiu,Xuhong Zhang,Shouling Ji,Yuwen Pu,Ting Wang
摘要：垂直联合学习是训练机器学习模型的多方协作的趋势解决方案。工业框架采用同态加密等安全的多方计算方法来保证数据的安全性和隐私性。然而，一项工作表明，VFL仍然存在泄漏风险。泄漏是由中间表示和原始数据之间的相关性引起的。由于深度神经网络强大的逼近能力，对手可以精确地捕捉相关性并重构数据。为了应对数据重构攻击的威胁，我们提出了一个基于哈希的VFL框架，称为\textit{HashVFL}，以直接切断可逆性。散列的单向特性允许我们的框架阻止所有从散列代码恢复数据的尝试。然而，集成散列也带来了一些挑战，例如，信息的丢失。本文提出并解决了集成哈希的三个挑战：可学习性、位平衡和一致性。实验结果证明了\textit{HashVFL}在保持主任务性能和抵御数据重构攻击方面的有效性。此外，还分析了它在异常输入检测中的潜在价值。此外，我们还进行了大量的实验来证明\textit{HashVFL}在各种设置中的通用性。总之，\textit{HashVFL}为保护VFL中多方数据的安全性和隐私性提供了新的视角。我们希望我们的研究能够吸引更多的研究人员来扩展HashVFL的应用领域。
摘要：Vertical federated learning is a trending solution for multi-party collaboration in training machine learning models. Industrial frameworks adopt secure multi-party computation methods such as homomorphic encryption to guarantee data security and privacy. However, a line of work has revealed that there are still leakage risks in VFL. The leakage is caused by the correlation between the intermediate representations and the raw data. Due to the powerful approximation ability of deep neural networks, an adversary can capture the correlation precisely and reconstruct the data. To deal with the threat of the data reconstruction attack, we propose a hashing-based VFL framework, called \textit{HashVFL}, to cut off the reversibility directly. The one-way nature of hashing allows our framework to block all attempts to recover data from hash codes. However, integrating hashing also brings some challenges, e.g., the loss of information. This paper proposes and addresses three challenges to integrating hashing: learnability, bit balance, and consistency. Experimental results demonstrate \textit{HashVFL}'s efficiency in keeping the main task's performance and defending against data reconstruction attacks. Furthermore, we also analyze its potential value in detecting abnormal inputs. In addition, we conduct extensive experiments to prove \textit{HashVFL}'s generalization in various settings. In summary, \textit{HashVFL} provides a new perspective on protecting multi-party's data security and privacy in VFL. We hope our study can attract more researchers to expand the application domains of \textit{HashVFL}.

【4】 Hijack Vertical Federated Learning Models with Adversarial Embedding
标题：对抗性嵌入劫持垂直联合学习模型
链接：https://arxiv.org/abs/2212.00322

作者：Pengyu Qiu,Xuhong Zhang,Shouling Ji,Changjiang Li,Yuwen Pu,Xing Yang,Ting Wang
机构：Zhejiang University, Pennsylvania State University, National University of Defense Technology
摘要：垂直联合学习（Vertical Federated Learning，VFL）是一种新兴的范式，支持协作者以分布式方式共同构建机器学习模型。通常，这些方具有一组共同的用户，但具有不同的特征。现有的VFL框架使用密码技术来提供数据隐私和安全保证，导致了一系列研究计算效率和快速实现的工作。然而，VFL模型的安全性仍然未得到充分的研究。
摘要：Vertical federated learning (VFL) is an emerging paradigm that enables collaborators to build machine learning models together in a distributed fashion. In general, these parties have a group of users in common but own different features. Existing VFL frameworks use cryptographic techniques to provide data privacy and security guarantees, leading to a line of works studying computing efficiency and fast implementation. However, the security of VFL's model remains underexplored.

【5】 FIESTA: FIber gEneration and bundle Segmentation in Tractography using Autoencoders
标题：FIESTA：使用自动编码器的纤维生成和束束分割
链接：https://arxiv.org/abs/2212.00143

作者：Félix Dumais,Jon Haitz Legarreta,Carl Lemaire,Philippe Poulin,François Rheault,Laurent Petit,Maxime Descoteaux,Pierre-Marc Jodoin
机构：Sherbrooke Connectivity Imaging Lab (SCIL), Videos & Images Theory and Analytics Lab (VITAL), Department of Computer Science, Université de Sherbrooke, Canada, Centre de Calcul Scientifique, Medical Imaging and Neuroinformatic (MINi) Lab
备注：26 pages, 10 figures, submitted to NeuroImage
摘要：None
摘要：White matter bundle segmentation is a cornerstone of modern tractography to study the brain's structural connectivity in domains such as neurological disorders, neurosurgery, and aging. In this study, we present FIESTA (FIber gEneration and bundle Segmentation in Tractography using Autoencoders), a reliable and robust, fully automated, and easily semi-automatically calibrated pipeline based on deep autoencoders that can dissect and fully populate WM bundles. Our framework allows the transition from one anatomical bundle definition to another with marginal calibrating time. This pipeline is built upon FINTA, CINTA, and GESTA methods that demonstrated how autoencoders can be used successfully for streamline filtering, bundling, and streamline generation in tractography. Our proposed method improves bundling coverage by recovering hard-to-track bundles with generative sampling through the latent space seeding of the subject bundle and the atlas bundle. A latent space of streamlines is learned using autoencoder-based modeling combined with contrastive learning. Using an atlas of bundles in standard space (MNI), our proposed method segments new tractograms using the autoencoder latent distance between each tractogram streamline and its closest neighbor bundle in the atlas of bundles. Intra-subject bundle reliability is improved by recovering hard-to-track streamlines, using the autoencoder to generate new streamlines that increase each bundle's spatial coverage while remaining anatomically meaningful. Results show that our method is more reliable than state-of-the-art automated virtual dissection methods such as RecoBundles, RecoBundlesX, TractSeg, White Matter Analysis and XTRACT. Overall, these results show that our framework improves the practicality and usability of current state-of-the-art bundling framework

【6】 Generative Adversarial Learning of Sinkhorn Algorithm Initializations
标题：Sinkhorn算法初始化的生成性对抗性学习
链接：https://arxiv.org/abs/2212.00133

作者：Jonathan Geuter,Vaios Laschos
机构： 1Department of Mathematics, Technische Universit¨at Berlin, Germany 2Weierstrass Institute
备注：9 pages, 7 figures
摘要：Sinkhorn算法（arXiv：1306.0895）是计算离散概率分布之间的最优输运距离近似值的最先进算法，它利用了问题的熵正则化公式。无论初始化如何，算法都保证收敛。这导致很少注意初始化它，并且像n维一向量这样的简单的开始向量是常见的选择。我们训练了一个神经网络来计算算法的初始化，它明显优于标准的初始化。该网络预测最优传输对偶问题的可能性，其中使用第二生成网络以对抗方式进行训练。该网络是普适的，因为它能够推广到任何一对固定维数的分布。此外，我们还表明，对于某些应用，网络可以独立使用。
摘要：The Sinkhorn algorithm (arXiv:1306.0895) is the state-of-the-art to compute approximations of optimal transport distances between discrete probability distributions, making use of an entropically regularized formulation of the problem. The algorithm is guaranteed to converge, no matter its initialization. This lead to little attention being paid to initializing it, and simple starting vectors like the n-dimensional one-vector are common choices. We train a neural network to compute initializations for the algorithm, which significantly outperform standard initializations. The network predicts a potential of the optimal transport dual problem, where training is conducted in an adversarial fashion using a second, generating network. The network is universal in the sense that it is able to generalize to any pair of distributions of fixed dimension. Furthermore, we show that for certain applications the network can be used independently.

【7】 Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning
标题：基于深度学习的下一代DNA测序中的可扩展病原体检测
链接：https://arxiv.org/abs/2212.00015

作者：Sai Narayanan,Sathyanarayanan N. Aakur,Priyadharsini Ramamurthy,Arunkumar Bagavathi,Vishalini Ramnath,Akhilesh Ramachandran
机构： Ramnath are with theDepartment of Computer Science, Oklahoma State University
备注：Expanded version of arXiv:2111.08001. Under review in a journal
摘要：下一代测序技术扩大了物联网（IoT）的范围，通过以更低的成本从异构来源收集的丰富基因组数据的可用性增加，将基因组学纳入个性化医疗。鉴于所收集数据的绝对数量以及跨物种存在高度相似的基因组结构所带来的重大挑战，需要强大的、可扩展的分析平台来提取可操作的知识，如潜在人畜共患病病原体的存在。新型病原体导致的人畜共患病的出现，如1918年的流感病毒和2019年的SARS-CoV-2，它们可以跨越物种障碍并导致大流行，这突出了对可扩展宏基因组分析的需求。在这项工作中，我们提出了MG2Vec，这是一种基于深度学习的解决方案，使用变压器网络作为其主干，从原始宏基因组序列中学习稳健的特征，用于下游生物医学任务，如靶向和广义病原体检测。在四个日益具有挑战性但现实的诊断环境中进行的广泛实验表明，所提出的方法可以帮助检测来自未经处理的真实世界临床样本的病原体，而以标签形式的人类监督最少。此外，我们证明了学习的表征可以推广到完全不相关的病原体，跨越疾病和物种，用于大规模的宏基因组分析。我们提供了一个用于基于宏基因组的疾病诊断和深度学习的新型表征学习框架的全面评估，并提供了一种从低成本下一代测序中提取和使用鲁棒矢量表征来开发可推广诊断工具的前进道路。
摘要：Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure across species, there is a need for robust, scalable analysis platforms to extract actionable knowledge such as the presence of potentially zoonotic pathogens. The emergence of zoonotic diseases from novel pathogens, such as the influenza virus in 1918 and SARS-CoV-2 in 2019 that can jump species barriers and lead to pandemic underscores the need for scalable metagenome analysis. In this work, we propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone, to learn robust features from raw metagenome sequences for downstream biomedical tasks such as targeted and generalized pathogen detection. Extensive experiments on four increasingly challenging, yet realistic diagnostic settings, show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples with minimal human supervision in the form of labels. Further, we demonstrate that the learned representations can generalize to completely unrelated pathogens across diseases and species for large-scale metagenome analysis. We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning and provide a way forward for extracting and using robust vector representations from low-cost next generation sequencing to develop generalizable diagnostic tools.

【8】 Physics-Constrained Generative Adversarial Networks for 3D Turbulence
标题：物理约束的三维湍流生成对抗性网络
链接：https://arxiv.org/abs/2212.00217

作者：Dima Tretiak,Arvind T. Mohan,Daniel Livescu
机构：CCS-,Computational Physics and Methods, Los Alamos National Laboratory, Georgia Institute of Technology, Center for Nonlinear Studies
摘要：生成式对抗网络（Generative Adversarial Networks，GANs）因其能够生成逼真的2D图像而受到机器学习（Machine Learning，ML）社区的广泛赞誉。机器学习正越来越多地应用于计算机视觉以外的复杂问题。然而，现有框架往往充当黑盒，缺乏物理嵌入，导致约束执行能力差，模型不可靠。在这项工作中，我们开发了物理嵌入，可以严格地施加，称为硬约束，在神经网络结构。我们通过将其嵌入到GANs中来证明其对三维湍流的能力，特别是在不可压缩流体湍流中加强质量守恒约束。在此过程中，我们还探讨和对比了在GANs框架内施加物理约束的其他方法的效果，特别是在文献中流行的基于惩罚的物理约束。通过使用物理学诊断和统计，我们评估了我们的方法的优势和劣势，并证明了它的可行性。
摘要：Generative Adversarial Networks (GANs) have received wide acclaim among the machine learning (ML) community for their ability to generate realistic 2D images. ML is being applied more often to complex problems beyond those of computer vision. However, current frameworks often serve as black boxes and lack physics embeddings, leading to poor ability in enforcing constraints and unreliable models. In this work, we develop physics embeddings that can be stringently imposed, referred to as hard constraints, in the neural network architecture. We demonstrate their capability for 3D turbulence by embedding them in GANs, particularly to enforce the mass conservation constraint in incompressible fluid turbulence. In doing so, we also explore and contrast the effects of other methods of imposing physics constraints within the GANs framework, especially penalty-based physics constraints popular in literature. By using physics-informed diagnostics and statistics, we evaluate the strengths and weaknesses of our approach and demonstrate its feasibility.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 Uniform versus uncertainty sampling: When being active is less efficient than staying passive
标题：均匀抽样与不确定性抽样：当处于主动状态时效率低于保持被动状态时
链接：https://arxiv.org/abs/2212.00772

作者：Alexandru Tifrea,Jacob Clarysse,Fanny Yang
机构：Department of Computer Science, ETH Zurich
摘要：普遍认为，给定相同的标记预算，主动学习算法（如不确定性采样）实现了比被动学习（即均匀采样）更好的预测性能，尽管计算成本更高。最近的经验证据表明，这种额外的成本可能是徒劳的，因为不确定性抽样有时甚至比被动学习表现得更差。虽然现有的工作在低维状态下提供了不同的解释，但本文表明，在高维状态下，潜在的机制是完全不同的：我们证明了对于逻辑回归，即使对于无噪声数据和当使用贝叶斯最优分类器的不确定性时，被动学习也优于不确定性采样。从我们的证明中可以看出，当类之间的间隔很小时，这种高维现象会加剧。我们通过对20个高维数据集的实验证实了这种直觉，这些数据集涵盖了从金融和组织学到化学和计算机视觉的各种应用。
摘要：It is widely believed that given the same labeling budget, active learning algorithms like uncertainty sampling achieve better predictive performance than passive learning (i.e. uniform sampling), albeit at a higher computational cost. Recent empirical evidence suggests that this added cost might be in vain, as uncertainty sampling can sometimes perform even worse than passive learning. While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that passive learning outperforms uncertainty sampling even for noiseless data and when using the uncertainty of the Bayes optimal classifier. Insights from our proof indicate that this high-dimensional phenomenon is exacerbated when the separation between the classes is small. We corroborate this intuition with experiments on 20 high-dimensional datasets spanning a diverse range of applications, from finance and histology to chemistry and computer vision.

【2】 ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process
标题：ODPP：基于行列点过程的无监督期权发现统一算法框架
链接：https://arxiv.org/abs/2212.00211

作者：Jiayu Chen,Vaneet Aggarwal,Tian Lan
摘要：在没有外部奖励监督的情况下，通过时间抽象学习丰富的技能是强化学习研究的前沿。现有工程主要分为两类：变分和基于拉普拉斯期权发现。前者通过互信息损失来最大化所发现的选项的多样性，但忽略了状态空间的覆盖;后者通过在探索过程中增加连通性来提高选项的覆盖率，但没有考虑多样性.本文提出了一个统一的框架，该框架通过使用一种新颖的决定点过程（DPP）来量化多样性和覆盖率，并使得无监督的选项发现显式地优化这两个目标。具体地，我们利用状态转移图的拉普拉斯谱定义DPP核矩阵，并以轨迹中的期望模式数为目标来捕获和增强学习选项的多样性和覆盖性.通过Mujoco和Atari构建的具有挑战性的任务对所提出的算法进行了广泛的评估，结果表明所提出的算法在多样性和覆盖率驱动的类别上都显著优于SOTA基线.这些代码可在www.example.com上找到https://github.com/LucasCJYSDL/ODPP。
摘要：Learning rich skills through temporal abstractions without supervision of external rewards is at the frontier of Reinforcement Learning research. Existing works mainly fall into two distinctive categories: variational and Laplacian-based option discovery. The former maximizes the diversity of the discovered options through a mutual information loss but overlooks coverage of the state space, while the latter focuses on improving the coverage of options by increasing connectivity during exploration, but does not consider diversity. In this paper, we propose a unified framework that quantifies diversity and coverage through a novel use of the Determinantal Point Process (DPP) and enables unsupervised option discovery explicitly optimizing both objectives. Specifically, we define the DPP kernel matrix with the Laplacian spectrum of the state transition graph and use the expected mode number in the trajectories as the objective to capture and enhance both diversity and coverage of the learned options. The proposed option discovery algorithm is extensively evaluated using challenging tasks built with Mujoco and Atari, demonstrating that our proposed algorithm substantially outperforms SOTA baselines from both diversity- and coverage-driven categories. The codes are available at https://github.com/LucasCJYSDL/ODPP.

【3】 SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch
标题：SPADE：分布失配下的半监督异常检测
链接：https://arxiv.org/abs/2212.00173

作者：Jinsung Yoon,Kihyuk Sohn,Chun-Liang Li,Sercan O. Arik,Tomas Pfister
机构：Google Cloud AI
摘要：半监督异常检测是一个常见的问题，因为通常包含异常的数据集是部分标记的。我们提出了一个规范框架：提出了一种不受标记数据和未标记数据来自同一分布假设限制的半监督伪标记异常检测方法（SPADE）。实际上，在许多应用中经常违反该假设-例如，与未标记数据不同，标记数据可能仅包含异常，或者未标记数据可能包含不同类型的异常，或者标记数据可能仅包含“易于标记”的样本。SPADE利用一类分类器的集成作为伪标记器，提高了伪标记算法在分布失配情况下的鲁棒性。提出了一种基于部分匹配的伪标记算法，该算法在没有验证数据的情况下自动选择伪标记的关键超参数，这在标记数据有限的情况下是至关重要的。SPADE在表格域和图像域的分布不匹配的广泛场景中显示了最先进的半监督异常检测性能。在一些常见的真实环境中，如面对新类型的未标记异常的模型，SPADE的AUC平均比现有的替代方法高5%。
摘要：Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.

【4】 SWL-Adapt: An Unsupervised Domain Adaptation Model with Sample Weight Learning for Cross-User Wearable Human Activity Recognition
标题：SWL-Adapt：一种基于样本权重学习的跨用户可穿戴人体活动识别领域自适应模型
链接：https://arxiv.org/abs/2212.00724

作者：Rong Hu,Ling Chen,Shenghuan Miao,Xing Tang
机构：College of Computer Science and Technology, Zhejiang University, Hangzhou , China, Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou , China
备注：Accepted by AAAI 2023
摘要：在实际应用中，可穿戴人体活动识别（WHAR）模型在新用户上通常会因用户变化而面临性能下降的问题。无监督领域自适应（UDA）成为解决标注稀缺条件下跨用户WHAR问题的自然选择。现有的UDA模型通常不区分样本，忽略了样本之间的差异性。提出了一种基于样本权重学习的无监督领域自适应模型（SWL-Adapt）. SWL-Adapt根据每个样本的分类损失和领域鉴别损失，利用参数化网络计算样本权重。我们引入了基于元优化的更新规则来端到端地学习该网络，该更新规则由所选择的伪标记目标样本上的元分类损失来指导。因此，该网络可以根据当前跨用户WHAR任务拟合加权函数，优于现有的针对特定场景的样本区分规则.在3个公共WHAR数据集上的实验结果表明，SWL-Adapt在跨用户WHAR任务上取得了较好的性能，其准确率和宏F1得分分别比最佳基线平均提高了3.1%和5.3%.
摘要：In practice, Wearable Human Activity Recognition (WHAR) models usually face performance degradation on the new user due to user variance. Unsupervised domain adaptation (UDA) becomes the natural solution to cross-user WHAR under annotation scarcity. Existing UDA models usually align samples across domains without differentiation, which ignores the difference among samples. In this paper, we propose an unsupervised domain adaptation model with sample weight learning (SWL-Adapt) for cross-user WHAR. SWL-Adapt calculates sample weights according to the classification loss and domain discrimination loss of each sample with a parameterized network. We introduce the meta-optimization based update rule to learn this network end-to-end, which is guided by meta-classification loss on the selected pseudo-labeled target samples. Therefore, this network can fit a weighting function according to the cross-user WHAR task at hand, which is superior to existing sample differentiation rules fixed for special scenarios. Extensive experiments on three public WHAR datasets demonstrate that SWL-Adapt achieves the state-of-the-art performance on the cross-user WHAR task, outperforming the best baseline by an average of 3.1% and 5.3% in accuracy and macro F1 score, respectively.

迁移|Zero/Few/One-Shot|自适应(9篇)

【1】 Improving Zero-Shot Models with Label Distribution Priors
标题：具有标签分布先验的零射模型的改进
链接：https://arxiv.org/abs/2212.00784

作者：Jonathan Kahana,Niv Cohen,Yedid Hoshen
机构：School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
摘要：用诸如面部年龄或对象类型之类的属性来标记大型图像数据集是乏味的，并且有时是不可行的。有监督的机器学习方法提供了一种高度精确的解决方案，但需要手动标记，这通常是不可用的。Zero-shot模型（例如，CLIP）不需要手动标注，但不如监督标注准确，尤其是当属性为数字时。我们提出了一种新的方法CLIPPR（CLIP with Priors），它采用zero-shot模型对未标记数据集进行回归和分类。我们的方法不使用任何带注释的图像。相反，我们假设数据集中的标签分布具有先验分布。然后，我们在CLIP上训练一个适配器网络，其目标有两个：i）来自原始CLIP模型的预测的最小变化ii）预测的和先前的标记分布之间的最小距离。此外，我们提出了一种新的方法来选择提示视觉和语言模型使用分布先验。我们的方法是有效的，并提出了一个显着改善了原来的模型。我们证明了UTK年龄回归任务的平均绝对误差提高了28%。我们还为分类基准测试提供了有希望的结果，在不使用任何标签的情况下，将ImageNet数据集的分类准确率提高了2.83%。
摘要：Labeling large image datasets with attributes such as facial age or object type is tedious and sometimes infeasible. Supervised machine learning methods provide a highly accurate solution, but require manual labels which are often unavailable. Zero-shot models (e.g., CLIP) do not require manual labels but are not as accurate as supervised ones, particularly when the attribute is numeric. We propose a new approach, CLIPPR (CLIP with Priors), which adapts zero-shot models for regression and classification on unlabelled datasets. Our method does not use any annotated images. Instead, we assume a prior over the label distribution in the dataset. We then train an adapter network on top of CLIP under two competing objectives: i) minimal change of predictions from the original CLIP model ii) minimal distance between predicted and prior distribution of labels. Additionally, we present a novel approach for selecting prompts for Vision & Language models using a distributional prior. Our method is effective and presents a significant improvement over the original model. We demonstrate an improvement of 28% in mean absolute error on the UTK age regression task. We also present promising results for classification benchmarks, improving the classification accuracy on the ImageNet dataset by 2.83%, without using any labels.

【2】 Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis
标题：基于分层融合的自适应多模式BERT情感分析
链接：https://arxiv.org/abs/2212.00678

作者：Odysseas S. Chlapanis,Georgios Paraskevopoulos,Alexandros Potamianos
机构： National Technical University of Athens, Athens, Greece, Institute for Language and Speech Processing, Athena Research Center, Athens, Greece
摘要：多模态学习管道受益于预先训练的语言模型的成功。然而，这是以增加模型参数为代价的。在本文中，我们提出了自适应多模式BERT（AMB），一种基于BERT的多模式任务体系结构，它使用了适配器模块和中间融合层的组合。适配器针对手头的任务调整预训练的语言模型，而融合层执行视听信息与文本BERT表示的任务特定的逐层融合。在自适应过程期间，预训练的语言模型参数保持冻结，允许快速、参数有效的训练。在我们的消融中，我们发现这种方法可以产生有效的模型，其性能优于其微调的对应模型，并且对输入噪声具有鲁棒性。我们使用CMU-MOSEI进行的情感分析实验表明，AMB在各个指标上都优于当前最先进的方法，结果误差相对减少了3.4%，7类分类准确率相对提高了2.1%。
摘要：Multimodal learning pipelines have benefited from the success of pretrained language models. However, this comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. The adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. During the adaptation process the pre-trained language model parameters remain frozen, allowing for fast, parameter-efficient training. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise. Our experiments on sentiment analysis with CMU-MOSEI show that AMB outperforms the current state-of-the-art across metrics, with 3.4% relative reduction in the resulting error and 2.1% relative improvement in 7-class classification accuracy.

【3】 Finetune like you pretrain: Improved finetuning of zero-shot vision models
标题：像你一样训练前的微调：改进了Zero-Shot视觉模型的微调
链接：https://arxiv.org/abs/2212.00638

作者：Sachin Goyal,Ananya Kumar,Sankalp Garg,Zico Kolter,Aditi Raghunathan
机构：Carnegie Mellon University, Stanford University, Bosch Center for AI
备注：20 Pages, 7 Tables, 5 Figures
摘要：None
摘要：Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by $2.3\%$ ID and $2.7\%$ OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of $4.2\%$ OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than $1\%$ both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to $4.6\%$ over standard finetuning and $4.4\%$ over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP.

【4】 Multi-Source Survival Domain Adaptation
标题：多源生存域自适应
链接：https://arxiv.org/abs/2212.00424

作者：Ammar Shaker,Carolin Lawrence
机构：NEC Laboratories Europe GmbH, Heidelberg, Germany
备注：37th AAAI Conference on Artificial Intelligence, 2023. Includes Appendix
摘要：生存分析（英语：Survival analysis）是统计学的一个分支，研究生物体的特征与其各自生存时间之间的关系，并考虑到删失病例所掌握的部分信息。例如，一个好的分析可以确定对一组患者的一种医学治疗是否比另一种更好。随着机器学习的兴起，生存分析可以被建模为学习一个函数，该函数将所研究的患者映射到他们的生存时间。要做到这一点，需要解决三个关键问题。首先，对一些患者数据进行删失：我们不知道所有患者的真实存活时间。第二，数据缺乏，这导致过去的研究将不同的疾病类型作为多任务设置中的域来处理。第三，需要适应新的或极其罕见的疾病类型，这些疾病类型很少或没有标签。与以前的多任务设置不同，我们希望研究如何从多个生存源域有效地适应新的生存目标域。为此，我们引入了一种新的生存度量和相应的生存分布差异度量.这些允许我们定义生存分析的域适应，同时纳入删失数据，否则将不得不丢弃。我们在两个癌症数据集上的实验揭示了在目标域上的卓越性能、更好的治疗建议和具有合理解释的权重矩阵。
摘要：Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis can be modeled as learning a function that maps studied patients to their survival times. To succeed with that, there are three crucial issues to be tackled. First, some patient data is censored: we do not know the true survival times for all patients. Second, data is scarce, which led past research to treat different illness types as domains in a multi-task setup. Third, there is the need for adaptation to new or extremely rare illness types, where little or no labels are available. In contrast to previous multi-task setups, we want to investigate how to efficiently adapt to a new survival target domain from multiple survival source domains. For this, we introduce a new survival metric and the corresponding discrepancy measure between survival distributions. These allow us to define domain adaptation for survival analysis while incorporating censored data, which would otherwise have to be dropped. Our experiments on two cancer data sets reveal a superb performance on target domains, a better treatment recommendation, and a weight matrix with a plausible explanation.

【5】 Differentially Private Learning with Per-Sample Adaptive Clipping
标题：基于逐样本自适应剪裁的差分私有学习
链接：https://arxiv.org/abs/2212.00328

作者：Tianyu Xia,Shuheng Shen,Su Yao,Xinyi Fu,Ke Xu,Xiaolong Xu,Xing Fu,Weiqiang Wang
机构：Wang, Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Department of Computer Science & Technology, Tsinghua University ,Zhongguancun Laboratory, Beijing
备注：To appear in AAAI 2023
摘要：None
摘要：Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.

【6】 Differentially Private Adaptive Optimization with Delayed Preconditioners
标题：预条件延迟的差分私有自适应优化
链接：https://arxiv.org/abs/2212.00309

作者：Tian Li,Manzil Zaheer,Ken Ziyu Liu,Sashank J. Reddi,H. Brendan McMahan,Virginia Smith
机构：Ziyu Liu, Carnegie Mellon University, Google DeepMind, Google Research
摘要：None
摘要：Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.

【7】 AUG-FedPrompt: Practical Few-shot Federated NLP with Data-augmented Prompts
标题：8月-FedPrompt：具有数据增强提示的实用的Few-Shot联邦NLP
链接：https://arxiv.org/abs/2212.00192

作者：Dongqi Cai,Yaozong Wu,Haitao Yuan,Shangguang Wang,Felix Xiaozhu Lin,Mengwei Xu
机构： Beiyou Shenzhen Institute, China, University of Virginia, USA
备注：Under review at ICASSP
摘要：None
摘要：Transformer-based pre-trained models have become the de-facto solution for NLP tasks. Fine-tuning such pre-trained models for downstream tasks often requires tremendous amount of data that is both private and labeled. However, in reality: 1) such private data cannot be collected and is distributed across mobile devices, and 2) well-curated labeled data is scarce. To tackle those issues, we first define a data generator for federated few-shot learning tasks, which encompasses the quantity and distribution of scarce labeled data in a realistic setting. Then we propose AUG-FedPrompt, a prompt-based federated learning algorithm that carefully annotates abundant unlabeled data for data augmentation. AUG-FedPrompt can perform on par with full-set fine-tuning with very few initial labeled data.

【8】 Deep Learning-Based Vehicle Speed Prediction for Ecological Adaptive Cruise Control in Urban and Highway Scenarios
标题：基于深度学习的城市和公路生态自适应巡航控制车速预测
链接：https://arxiv.org/abs/2212.00149

作者：Sai Krishna Chada,Daniel Görges,Achim Ebert,Roman Teutsch
机构： Institute of Electromobility Human Computer Interaction Group, Institute for Mechanical and Automotive Design, University of Kaiserslautern, Germany
备注：Submitted to IFAC World Congress 2023
摘要：在典型的车辆跟驰场景中，目标车辆速度波动充当对主车辆的外部干扰，并进而影响其能量消耗。为了使用模型预测控制（MPC）以节能的方式控制宿主车辆，并且此外，增强生态自适应巡航控制（EACC）策略的性能，预测目标车辆的未来速度是必要的。为此，研究了一种基于长短时记忆（LSTM）和门控递归单元（GRU）的深度递归神经网络车速预测方法.此外，还讨论了基于物理学的常速度（CV）和常加速度（CA）模型。用于训练的连续时间序列数据（例如，通过车辆到车辆（V2V）通信获得的目标车辆及其前面车辆的速度轨迹、道路速度限制、使用车辆到基础设施（V2I）通信收集的交通灯当前和未来相位）从微观交通模拟器SUMO中创建的城市和公路网络收集。对所提出的速度预测模型进行了评估，以长期预测（长达10 s）目标车辆未来速度。实验结果表明，基于LSTM的速度预测器在未知测试数据集上的预测精度更高，泛化能力更强.在此基础上，评价了EACC主车在预测速度下的性能，并给出了不同预测区间的节能效益。
摘要：In a typical car-following scenario, target vehicle speed fluctuations act as an external disturbance to the host vehicle and in turn affect its energy consumption. To control a host vehicle in an energy-efficient manner using model predictive control (MPC), and moreover, enhance the performance of an ecological adaptive cruise control (EACC) strategy, forecasting the future velocities of a target vehicle is essential. For this purpose, a deep recurrent neural network-based vehicle speed prediction using long-short term memory (LSTM) and gated recurrent units (GRU) is studied in this work. Besides these, the physics-based constant velocity (CV) and constant acceleration (CA) models are discussed. The sequential time series data for training (e.g. speed trajectories of the target and its preceding vehicles obtained through vehicle-to-vehicle (V2V) communication, road speed limits, traffic light current and future phases collected using vehicle-to-infrastructure (V2I) communication) is gathered from both urban and highway networks created in the microscopic traffic simulator SUMO. The proposed speed prediction models are evaluated for long-term predictions (up to 10 s) of target vehicle future velocities. Moreover, the results revealed that the LSTM-based speed predictor outperformed other models in terms of achieving better prediction accuracy on unseen test datasets, and thereby showcasing better generalization ability. Furthermore, the performance of EACC-equipped host car on the predicted velocities is evaluated, and its energy-saving benefits for different prediction horizons are presented.

【9】 Locally Adaptive Hierarchical Cluster Termination With Application To Individual Tree Delineation
标题：局部自适应层次聚类终止法及其在个体树划分中的应用
链接：https://arxiv.org/abs/2212.00288

作者：Ashlin Richardson,Donald Leckie
机构： Richardson is with the Department of Mathematics and Statistics, University of Victoria
备注：8 pages, 14 figures
摘要：提出了一种局部自适应的聚类终止方法（相对于表示凝聚合并的集合的层次树），用于在具有距离函数的集合上进行凝聚层次聚类.它代表了传统的基于尺度相关阈值的终止准则的多尺度替代。
摘要：A clustering termination procedure which is locally adaptive (with respect to the hierarchical tree of sets representative of the agglomerative merging) is proposed, for agglomerative hierarchical clustering on a set equipped with a distance function. It represents a multi-scale alternative to conventional scale dependent threshold based termination criteria.

强化学习(4篇)

【1】 Safe Reinforcement Learning with Probabilistic Control Barrier Functions for Ramp Merging
标题：基于概率控制屏障函数的匝道汇流安全强化学习
链接：https://arxiv.org/abs/2212.00618

作者：Soumith Udatha,Yiwei Lyu,John Dolan
机构： 1Carnegie Mellon University
备注：Safe Learning for Autonomous Driving Workshop, ICML 2022
摘要：None
摘要：Prior work has looked at applying reinforcement learning and imitation learning approaches to autonomous driving scenarios, but either the safety or the efficiency of the algorithm is compromised. With the use of control barrier functions embedded into the reinforcement learning policy, we arrive at safe policies to optimize the performance of the autonomous driving vehicle. However, control barrier functions need a good approximation of the model of the car. We use probabilistic control barrier functions as an estimate of the model uncertainty. The algorithm is implemented as an online version in the CARLA (Dosovitskiy et al., 2017) Simulator and as an offline version on a dataset extracted from the NGSIM Database. The proposed algorithm is not just a safe ramp merging algorithm but a safe autonomous driving algorithm applied to address ramp merging on highways.

【2】 Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox
标题：分布式深度强化学习：综述与多人多智能体学习工具箱
链接：https://arxiv.org/abs/2212.00253

作者：Qiyue Yin,Tongtong Yu,Shengqi Shen,Jun Yang,Meijing Zhao,Kaiqi Huang,Bin Liang,Liang Wang
机构： Kaiqi Huang and Liang Wang are with Institute of Automation, •Jun Yang and Bin Liang are with the Department of Automation, Ts-inghua University
备注：14 pages, 17 figures
摘要：随着AlphaGo的突破，深度强化学习成为解决序贯决策问题的公认技术。尽管深度强化学习有着良好的声誉，但其试错学习机制所导致的数据效率低下，使得其难以在广泛的领域得到实际应用。目前，已有大量的方法可以实现高效的深度强化学习，如环境建模、经验迁移、分布式修正等，其中分布式深度强化学习在人机游戏、智能交通等领域显示出了巨大的应用潜力.本文通过比较经典的分布式深度强化学习方法，研究实现高效分布式学习的重要组成部分，从单人单智能体分布式深度强化学习到最复杂的多人多智能体分布式深度强化学习，总结了这一令人兴奋的领域的研究现状。此外，我们回顾了最近发布的工具箱，这些工具箱有助于实现分布式深度强化学习，而无需对其非分布式版本进行许多修改。通过分析它们的优缺点，开发并发布了一个多人多智能体分布式深度强化学习工具箱，并在复杂游戏环境Wargame上进行了验证，表明了该工具箱在复杂游戏环境下多人多智能体分布式深度强化学习中的可用性。最后，本文指出了分布式深度强化学习面临的挑战和未来的发展趋势，希望本文能为分布式深度强化学习的研究者提供一点参考和启发。
摘要：With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

【3】 One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
标题：统领一切的一个风险：基于模型的离线强化学习的风险敏感视角
链接：https://arxiv.org/abs/2212.00124

作者：Marc Rigter,Bruno Lacerda,Nick Hawes
机构：One of the core issues in offline RL is avoiding distribu- 1Oxford Robotics Institute, University of Oxford
摘要：离线强化学习（RL）适用于在线探索成本过高或危险的安全关键领域。在安全关键环境中，决策应考虑灾难性后果的风险。换句话说，决策应该是风险敏感的。已有的离线RL风险研究将离线RL技术与风险敏感RL算法相结合，以避免分布漂移，实现风险敏感。在这项工作中，我们提出风险敏感性作为一种机制，以共同解决这两个问题。我们的基于模型的方法是风险规避的认识和随机不确定性。对认知不确定性的风险厌恶阻止了分布的转变，因为数据集未覆盖的区域具有较高的认知不确定性。对随机不确定性的风险厌恶会阻止由于环境随机性而导致不良结果的行为。实验结果表明，该算法在确定性基准测试上取得了较好的性能，在随机域上的风险敏感目标上优于现有算法.
摘要：Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributional shift, with risk-sensitive RL algorithms, to achieve risk-sensitivity. In this work, we propose risk-sensitivity as a mechanism to jointly address both of these issues. Our model-based approach is risk-averse to both epistemic and aleatoric uncertainty. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environment stochasticity. Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains.

【4】 Autotuning PID control using Actor-Critic Deep Reinforcement Learning
标题：基于Actor-Critic深度强化学习的自整定PID控制
链接：https://arxiv.org/abs/2212.00013

作者：Vivien van Veldhuizen
机构：Bachelor thesis, Credits: , EC, Bachelor Kunstmatige Intelligentie, University of Amsterdam, Science Park , XH Amsterdam, Supervisor, dr. E. Bruni, Institute for Logic, Language and Computation, XG Amsterdam, Jun ,st
摘要：本工作是一项探索性研究，旨在确定以何种方式可以使用强化学习来预测苹果收获机器人的最佳PID参数。为了研究这一点，在模拟的机器人手臂上实现了一种称为优势行动者评价（A2C）的算法。分别对一次一个执行器和一次两个执行器进行了整定实验，实验结果表明，该模型能够预测出比设定基准更好的PID增益。此外，研究了该模型是否能够根据苹果的位置预测PID参数。初步测试表明，该模型确实能够使其预测适应苹果的位置，使其成为一个自适应控制器。
摘要：This work is an exploratory research concerned with determining in what way reinforcement learning can be used to predict optimal PID parameters for a robot designed for apple harvest. To study this, an algorithm called Advantage Actor Critic (A2C) is implemented on a simulated robot arm. The simulation primarily relies on the ROS framework. Experiments for tuning one actuator at a time and two actuators a a time are run, which both show that the model is able to predict PID gains that perform better than the set baseline. In addition, it is studied if the model is able to predict PID parameters based on where an apple is located. Initial tests show that the model is indeed able to adapt its predictions to apple locations, making it an adaptive controller.

医学相关(3篇)

【1】 A Comprehensive Study on Machine Learning Methods to Increase the Prediction Accuracy of Classifiers and Reduce the Number of Medical Tests Required to Diagnose Alzheimer'S Disease
标题：提高分类器预测精度和减少诊断阿尔茨海默病所需医学测试次数的机器学习方法的综合研究
链接：https://arxiv.org/abs/2212.00414

作者：Md. Sharifur Rahman,Professor Girijesh Prasad
机构：School of Computing, Engineering, and Intelligent Systems, Ulster University, Northern, Ireland, UK
备注：Presented at the 3rd International Conference on Machine Learning Techniques and Data Science (MLDS 2022)
摘要：老年痴呆症患者逐渐丧失了思考、行为和与他人互动的能力。病史、实验室检查、日常活动和性格变化都可以用来诊断这种疾病。一系列耗时和昂贵的测试被用来诊断疾病。在这项研究中，识别阿尔茨海默病最有效的方法是使用随机森林分类器，以及各种其他机器学习技术。本研究的主要目标是微调分类器，以较少的测试检测疾病，同时保持合理的疾病发现准确性。我们使用30个常用指标中的4个成功识别了近94%的病例。
摘要：Alzheimer's patients gradually lose their ability to think, behave, and interact with others. Medical history, laboratory tests, daily activities, and personality changes can all be used to diagnose the disorder. A series of time-consuming and expensive tests are used to diagnose the illness. The most effective way to identify Alzheimer's disease is using a Random-forest classifier in this study, along with various other Machine Learning techniques. The main goal of this study is to fine-tune the classifier to detect illness with fewer tests while maintaining a reasonable disease discovery accuracy. We successfully identified the condition in almost 94% of cases using four of the thirty frequently utilized indicators.

【2】 Cellular Automata Model for Non-Structural Proteins Comparing Transmissibility and Pathogenesis of SARS Covid (CoV-2, CoV) and MERS Covid
标题：比较SARS冠状病毒(CoV-2，CoV)和MERS冠状病毒非结构蛋白传播性和致病机制的元胞自动机模型
链接：https://arxiv.org/abs/2212.00502

作者：Raju Hazari,Parimal Pal Chaudhuri
机构：Department of Computer Science and Engineering, National Institute of Technology, Calicut, Kerala, India , Retired Professor, Indian Institute of Technology Kharagpur, India
备注：arXiv admin note: text overlap with arXiv:2202.11752
摘要：与SARS CoV（2003）相比，SARS CoV-2（2019）的传播性显著更高，这可归因于结构蛋白（刺突S、核衣壳N、膜M和包膜E）的突变以及非结构蛋白（nsp）和辅助蛋白（ORF）在病毒复制、组装和脱落中发挥的作用。非结构蛋白（nsps）利用宿主蛋白合成机制启动病毒复制，同时中和宿主免疫防御。16个nsps中的关键蛋白是非结构蛋白nsp 1，也称为前导蛋白。Nsp 1通过阻止主机转换来主导劫持主机资源的过程。本文基于细胞自动机增强的机器学习（CAML）模型，对SARS冠状病毒（CoV-2，CoV）和MERS冠状病毒的nsps进行了分析。该计算模型比较了CoV-2的结构-功能与CoV的结构-功能的偏差，其中使用从nsps的氨基酸链的CA进化导出的CAML模型参数.该比较分析指出：（i）与CoV相比，CoV-2的主要nsps具有更高的传播性，（ii）MERS冠状病毒在毒力和发病机制方面与SARS CoV存在差异。设计了一个机器学习（ML）框架，用于将CAML模型参数映射到体外/体内/计算机模拟实验研究中报告的物理域特征。ML框架使我们能够从三种病毒的16个nsps的突变研究中获得模型参数的允许范围。
摘要：Significantly higher transmissibility of SARS CoV-2 (2019) compared to SARS CoV (2003) can be attributed to mutations of structural proteins (Spike S, Nucleocapsid N, Membrane M, and Envelope E) and the role played by non-structural proteins (nsps) and accessory proteins (ORFs) for viral replication, assembly and shedding. The non-structural proteins (nsps) avail host protein synthesis machinery to initiate viral replication, along with neutralization of host immune defense. The key protein out of the 16 nsps, is the non-structural protein nsp1, also known as the leader protein. Nsp1 leads the process of hijacking host resources by blocking host translation. This paper concentrates on the analysis of nsps of SARS covid (CoV-2, CoV) and MERS covid based on Cellular Automata enhanced Machine Learning (CAML) model developed for study of biological strings. This computational model compares deviation of structure - function of CoV-2 from that of CoV employing CAML model parameters derived out of CA evolution of amino acid chains of nsps. This comparative analysis points to - (i) higher transmissibility of CoV-2 compared to CoV for major nsps, and (ii) deviation of MERS covid from SARS CoV in respect of virulence and pathogenesis. A Machine Learning (ML) framework has been designed to map the CAML model parameters to the physical domain features reported in in-vitro/in-vivo/in-silico experimental studies. The ML framework enables us to learn the permissible range of model parameters derived out of mutational study of sixteen nsps of three viruses.

【3】 Attentional Ptycho-Tomography (APT) for three-dimensional nanoscale X-ray imaging with minimal data acquisition and computation time
标题：注意力体层摄影术(APT)用于三维纳米级X射线成像的最小数据采集和计算时间
链接：https://arxiv.org/abs/2212.00014

作者：Iksung Kang,Ziling Wu,Yi Jiang,Yudong Yao,Junjing Deng,Jeffrey Klug,Stefan Vogt,George Barbastathis
机构：Massachusetts Institute of Technology, Cambridge, Massachusetts , USA, Argonne National Laboratory, Lemont, Illinois , USA, Singapore-MIT Alliance for Research and Technology (SMART) Centre, Singapore
备注：27 pages, 7 figures
摘要：纳米级三维物体，例如集成电路（IC）的无创X射线成像通常需要两种类型的扫描：ptychographic，其是平移的并且通过IC返回复杂电磁场的估计;和从多个角度收集复杂场投影的断层扫描。在这里，我们介绍了注意力PTYCHO-Tomography（APT），这是一种经过训练的方法，即使测量不完整，也可以使用显著减少的角度扫描量来提供准确的IC重建。训练过程包括基于典型IC图案和X射线传播物理学的规则化先验。我们证明了具有12倍缩减角度的APT实现了与具有原始角度集的黄金标准相当的保真度。使用相同的一组减小的角度，APT也优于基线重建方法。在我们的实验中，APT在不影响质量的情况下，在数据采集和计算方面实现了108倍的总缩减。我们希望我们的物理辅助机器学习框架也可以应用于纳米级成像的其他分支。
摘要：Noninvasive X-ray imaging of nanoscale three-dimensional objects, e.g. integrated circuits (ICs), generally requires two types of scanning: ptychographic, which is translational and returns estimates of complex electromagnetic field through ICs; and tomographic scanning, which collects complex field projections from multiple angles. Here, we present Attentional Ptycho-Tomography (APT), an approach trained to provide accurate reconstructions of ICs despite incomplete measurements, using a dramatically reduced amount of angular scanning. Training process includes regularizing priors based on typical IC patterns and the physics of X-ray propagation. We demonstrate that APT with 12-time reduced angles achieves fidelity comparable to the gold standard with the original set of angles. With the same set of reduced angles, APT also outperforms baseline reconstruction methods. In our experiments, APT achieves 108-time aggregate reduction in data acquisition and computation without compromising quality. We expect our physics-assisted machine learning framework could also be applied to other branches of nanoscale imaging.

蒸馏|知识提取(1篇)

【1】 Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions
标题：通过语义分解将大语言模型的多步推理能力提炼成小模型
链接：https://arxiv.org/abs/2212.00193

作者：Kumar Shridhar,Alessandro Stolfo,Mrinmaya Sachan
机构：Department of Computer Science, ETH Z¨urich
摘要：逐步推理方法，如思维链（CoT）已被证明是一种非常有效的技术，以诱导大型语言模型的推理能力。然而，CoT方法的成功主要取决于模型规模，通常需要数十亿个参数规模的模型才能使CoT发挥作用。本文提出了一种知识提取方法，利用大模型的逐步CoT推理能力，将这些推理能力提取到小模型中。我们的方法Decompositional Distillation学习将原始问题语义分解为一系列子问题，并使用它来训练两个模型：a）问题分解器，其学习将复杂的推理问题分解成一系列较简单的子问题，以及b）问题求解器，其使用中间子问题来求解整个问题。在一个多步数学应用题数据集（GSM 8 K）上，与CoT相比，我们的方法提取GPT-2变体时，其性能提高了35%。我们表明，使用我们的方法，可以训练GPT-2-大模型（775 M），其性能优于使用CoT推理训练的10倍大的GPT-3（6 B）模型。最后，我们还证明了我们的问题分解方法也可以用作CoT提示的替代方法，与CoT提示相比，它将GPT-3性能提高了40%。
摘要：Step-by-step reasoning approaches like chain-of-thought (CoT) have proved to be a very effective technique to induce reasoning capabilities in large language models. However, the success of the CoT approach depends primarily on model size, and often billion parameter-scale models are needed to get CoT to work. In this paper, we propose a knowledge distillation approach, that leverages the step-by-step CoT reasoning capabilities of larger models and distils these reasoning abilities into smaller models. Our approach Decompositional Distillation learns a semantic decomposition of the original problem into a sequence of subproblems and uses it to train two models: a) a problem decomposer that learns to decompose the complex reasoning problem into a sequence of simpler sub-problems and b) a problem solver that uses the intermediate subproblems to solve the overall problem. On a multi-step math word problem dataset (GSM8K), we boost the performance of GPT-2 variants up to 35% when distilled with our approach compared to CoT. We show that using our approach, it is possible to train a GPT-2-large model (775M) that can outperform a 10X larger GPT-3 (6B) model trained using CoT reasoning. Finally, we also demonstrate that our approach of problem decomposition can also be used as an alternative to CoT prompting, which boosts the GPT-3 performance by 40% compared to CoT prompts.

聚类(2篇)

【1】 Clustering and Analysis of GPS Trajectory Data using Distance-based Features
标题：基于距离特征的GPS轨迹数据聚类与分析
链接：https://arxiv.org/abs/2212.00206

作者：Zann Koh,Yuren Zhou,Billy Pik Lik Lau,Ran Liu,Keng Hua Chong,Chau Yuen
机构：Engineering and Product Development, Singapore University of Technology and Design (SUTD), Singapore , Architecture and Sustainable Design, Singapore University of Technology and Design (SUTD), Singapore
备注：13 pages, 12 figures. To be published in IEEE Access
摘要：智能手机的普及大大增加了可用的移动数据的类型和数量，从而加速了移动研究。移动性数据的一个这样的来源是来自GPS技术，GPS技术正变得越来越普遍，并且帮助研究团体理解人的移动性模式。然而，缺乏一个标准化的框架来使用机器学习方法研究由工作和非工作用户在工作日和休息日的非工作、非家庭位置创建的不同移动模式。提出了一种新的移动性度量--每日特征距离，并将其与O-D矩阵特征一起用于生成每个用户的特征。然后，我们将这些特征与无监督机器学习方法$k$-均值聚类一起使用，并针对每种类型的日期（工作日和休息日）获得三个用户聚类。最后，提出了两个新的聚类结果分析指标，即用户公共性和平均频率。通过使用所提出的度量，可以识别感兴趣的用户行为，这有助于我们更好地理解用户的移动模式。
摘要：The proliferation of smartphones has accelerated mobility studies by largely increasing the type and volume of mobility data available. One such source of mobility data is from GPS technology, which is becoming increasingly common and helps the research community understand mobility patterns of people. However, there lacks a standardized framework for studying the different mobility patterns created by the non-Work, non-Home locations of Working and Nonworking users on Workdays and Offdays using machine learning methods. We propose a new mobility metric, Daily Characteristic Distance, and use it to generate features for each user together with Origin-Destination matrix features. We then use those features with an unsupervised machine learning method, $k$-means clustering, and obtain three clusters of users for each type of day (Workday and Offday). Finally, we propose two new metrics for the analysis of the clustering results, namely User Commonality and Average Frequency. By using the proposed metrics, interesting user behaviors can be discerned and it helps us to better understand the mobility patterns of the users.

【2】 Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
标题：基于视觉辅助聚类排名的时间效率奖励学习
链接：https://arxiv.org/abs/2212.00169

作者：David Zhang,Micah Carroll,Andreea Bobu,Anca Dragan
机构：UC Berkeley∗
备注：Presented at the NeurIPS 2022 Human in the Loop Learning (HiLL) Workshop
摘要：奖励学习最成功的范例之一是以比较的形式使用人类反馈。虽然这些方法有希望，但人类比较标记昂贵且耗时，构成了它们更广泛应用的主要瓶颈。我们的见解是，通过将比较集中在一起进行批处理，而不是让人工对每个比较进行单独标记，我们可以大大提高这些方法中人工时间的使用效率。为此，我们利用数据降维和可视化技术来向人类提供显示状态空间的交互式GUI，其中用户可以标记状态空间的子部分。通过一些简单的Mujoco任务，我们表明，这种高级方法有希望，能够大大提高所得代理的性能，提供相同数量的人类标记时间。
摘要：One of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. Our insight is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together, rather than having the human label each comparison individually. To do so, we leverage data dimensionality-reduction and visualization techniques to provide the human with a interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, we show that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.

超分辨率|去噪|去模糊|去雾(2篇)

【1】 Denoising Diffusion for Sampling SAT Solutions
标题：采样SAT解的去噪扩散算法
链接：https://arxiv.org/abs/2212.00121

作者：Karlis Freivalds,Sergejs Kozlovics
机构：Denoising Diffusion for Sampling SAT SolutionsK¯arlis FreivaldsInstitute of Electronics and Computer Science, lvSergejs KozloviˇcsInstitute of Mathematics and Computer Science, University of Latviasergejs
备注：NeurIPS 2022 Workshop on Score-Based Methods
摘要：为布尔可满足性问题（SAT）生成不同的解是一个复杂的计算问题，在软件和硬件设计的测试和功能验证中具有实际应用。我们探索了使用去噪扩散与图神经网络耦合来实现去噪功能的方法来生成这样的解。我们发现，即使系统是用来自标准求解器的非随机解训练的，所获得的精度与当前最好的纯神经方法相似，并且所产生的SAT解是高度多样的。
摘要：Generating diverse solutions to the Boolean Satisfiability Problem (SAT) is a hard computational problem with practical applications for testing and functional verification of software and hardware designs. We explore the way to generate such solutions using Denoising Diffusion coupled with a Graph Neural Network to implement the denoising function. We find that the obtained accuracy is similar to the currently best purely neural method and the produced SAT solutions are highly diverse, even if the system is trained with non-random solutions from a standard solver.

【2】 MrSARP: A Hierarchical Deep Generative Prior for SAR Image Super-resolution
标题：MRSARP：一种适用于SAR图像超分辨率的分层深度生成先验算法
链接：https://arxiv.org/abs/2212.00069

作者：Tushar Agarwal,Nithin Sugavanam,Emre Ertin
备注：This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
摘要：使用深度学习方法从训练中学习的生成模型可以用作逆欠定逆问题中的先验，包括从稀疏测量集成像。本文提出了一种新的分层深度生成模型MrSARP，该模型能够联合合成不同分辨率的SAR图像。MrSARP与评论家一起训练，该评论家对多分辨率图像进行联合评分，以确定它们是否是目标在不同分辨率下的真实图像。我们展示了如何使用这种深度生成模型从同一目标的低分辨率图像中检索高空间分辨率图像。修改生成器的成本函数以提高其检索给定分辨率图像集的输入参数的能力。我们使用用于评估模拟数据上的超分辨率性能的三个标准误差度量来评估模型的性能，并将其与基于上采样和稀疏性的图像锐化方法进行比较。
摘要：Generative models learned from training using deep learning methods can be used as priors in inverse under-determined inverse problems, including imaging from sparse set of measurements. In this paper, we present a novel hierarchical deep-generative model MrSARP for SAR imagery that can synthesize SAR images of a target at different resolutions jointly. MrSARP is trained in conjunction with a critic that scores multi resolution images jointly to decide if they are realistic images of a target at different resolutions. We show how this deep generative model can be used to retrieve the high spatial resolution image from low resolution images of the same target. The cost function of the generator is modified to improve its capability to retrieve the input parameters for a given set of resolution images. We evaluate the model's performance using the three standard error metrics used for evaluating super-resolution performance on simulated data and compare it to upsampling and sparsity based image sharpening approaches.

自动驾驶|车辆|车道检测等(1篇)

【1】 Online Learning-based Waveform Selection for Improved Vehicle Recognition in Automotive Radar
标题：基于在线学习的波形选择改进汽车雷达车辆识别
链接：https://arxiv.org/abs/2212.00615

作者：Charles E. Thornton,William W. Howard,R. Michael Buehrer
机构： Bradley Department of ECE
备注：5 pages, 3 figures
摘要：本文描述了调频连续波（FMCW）汽车雷达系统中基于在线增强学习的波形选择用于目标识别的重要考虑因素和挑战。提出了一种基于满意Thompson采样的波形学习方法，该方法能够快速识别出期望得到满意分类性能的波形.我们通过测量级仿真证明，即使在雷达必须从大量候选波形中进行选择的情况下，也可以快速学习有效的波形选择策略。雷达学习通过优化预期分类度量来自适应地选择用于适当分辨率的带宽和用于在感兴趣场景中减轻干扰的慢时间单模码。
摘要：This paper describes important considerations and challenges associated with online reinforcement-learning based waveform selection for target identification in frequency modulated continuous wave (FMCW) automotive radar systems. We present a novel learning approach based on satisficing Thompson sampling, which quickly identifies a waveform expected to yield satisfactory classification performance. We demonstrate through measurement-level simulations that effective waveform selection strategies can be quickly learned, even in cases where the radar must select from a large catalog of candidate waveforms. The radar learns to adaptively select a bandwidth for appropriate resolution and a slow-time unimodular code for interference mitigation in the scene of interest by optimizing an expected classification metric.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Data-driven Science and Machine Learning Methods in Laser-Plasma Physics
标题：激光等离子体物理中的数据驱动科学和机器学习方法
链接：https://arxiv.org/abs/2212.00026

作者：Andreas Döpp,Christoph Eberle,Sunny Howard,Faran Irshad,Jinpu Lin,Matthew Streeter
机构：Ludwig–Maximilians–Universit¨at M¨unchen, Am Coulombwall , Garching, Germany, Department of Physics, Clarendon Laboratory, University of Oxford, Parks Road, Oxford OX,PU, United Kingdom
摘要：激光等离子体物理学在过去的几十年里得到了迅速的发展，因为激光已经变得更加强大和更加广泛地可用。该领域的早期实验和数值研究主要是单次实验和有限的参数探索。然而，最近的技术进步使得在实验和模拟中收集成百上千种不同设置的数据成为可能。这激发了人们对利用数学、统计学和计算机科学等先进技术处理大数据并从中受益的兴趣。同时，复杂的建模技术也为研究人员提供了新的方法来有效地处理仍然只有稀疏数据可用的情况。本文综述了机器学习方法在激光等离子体物理及其重要子领域激光等离子体加速和惯性约束聚变中的应用。
摘要：Laser-plasma physics has developed rapidly over the past few decades as lasers have become both more powerful and more widely available. Early experimental and numerical research in this field was dominated by single-shot experiments with limited parameter exploration. However, recent technological improvements make it possible to gather data for hundreds or thousands of different settings in both experiments and simulations. This has sparked interest in using advanced techniques from mathematics, statistics and computer science to deal with, and benefit from, big data. At the same time, sophisticated modeling techniques also provide new ways for researchers to deal effectively with situation where still only sparse data are available. This paper aims to present an overview of relevant machine learning methods with focus on applicability to laser-plasma physics and its important sub-fields of laser-plasma acceleration and inertial confinement fusion.

联邦学习|隐私保护|加密(2篇)

【1】 Vertical Federated Learning: A Structured Literature Review
标题：纵向联合学习：结构化文献综述
链接：https://arxiv.org/abs/2212.00622

作者：Afsana Khan,Marijn ten Thij,Anna Wilbik
机构： ML has also made a significant contribution•The Authors are with the Department of Advanced Computing Sciences, Maastricht University, maastrichtuniversity
摘要：联邦学习（FL）是一种具有数据隐私优势的分布式学习模式。随着数据所有者之间协作兴趣的增长，FL已经得到了组织的极大关注。FL的思想是使协作参与者能够在不侵犯隐私的情况下对分散的数据训练机器学习（ML）模型。简而言之，联合学习是一种将模型与数据联系起来，而不是将数据与模型联系起来的方法。当应用于在参与者之间垂直划分的数据时，联合学习能够通过组合仅使用在本地站点处具有不同特征的数据训练的本地模型来构建完整的ML模型。这种FL的架构被称为垂直联合学习（VFL），其不同于水平划分数据上的传统FL。VFL不同于传统的FL，它有其自身的问题和挑战。在这篇文章中，我们提出了一个结构化的文献综述，讨论了最先进的方法在VFL。此外，本文还对现有的解决方案进行了综述，并提出了该领域的研究方向。
摘要：Federated Learning (FL) has emerged as a promising distributed learning paradigm with an added advantage of data privacy. With the growing interest in having collaboration among data owners, FL has gained significant attention of organizations. The idea of FL is to enable collaborating participants train machine learning (ML) models on decentralized data without breaching privacy. In simpler words, federated learning is the approach of ``bringing the model to the data, instead of bringing the data to the mode''. Federated learning, when applied to data which is partitioned vertically across participants, is able to build a complete ML model by combining local models trained only using the data with distinct features at the local sites. This architecture of FL is referred to as vertical federated learning (VFL), which differs from the conventional FL on horizontally partitioned data. As VFL is different from conventional FL, it comes with its own issues and challenges. In this paper, we present a structured literature review discussing the state-of-the-art approaches in VFL. Additionally, the literature review highlights the existing solutions to challenges in VFL and provides potential research directions in this domain.

【2】 Early prediction of the risk of ICU mortality with Deep Federated Learning
标题：深度联合学习对ICU死亡风险的早期预测
链接：https://arxiv.org/abs/2212.00554

作者：Korbinian Rand,Núria Lladós Armengol,Lena Mondrejevski,Ioanna Miliou
机构：Dept. of Computer & Systems Sciences, Stockholm University, Stockholm, Sweden
备注：12 pages
摘要：重症监护病房通常会收治有严重死亡风险的病人。最近的研究表明，机器学习能够指示患者的死亡风险，并将医生指向需要高度护理的个人。然而，医疗保健数据通常受到隐私法规的约束，因此无法轻松共享，从而无法构建使用多家医院合并数据的集中式机器学习模型。联合学习是一个机器学习框架，专为数据隐私而设计，可用于规避这一问题。在本研究中，我们评估了深度联合学习在早期预测重症监护室死亡率风险的能力。我们在AUPRC、F1-score和AUROC方面比较了联合、集中和本地机器学习的预测性能。我们的结果表明，联合学习的表现与集中式方法一样好，大大优于局部方法，从而为早期重症监护病房死亡率预测提供了一个可行的解决方案。此外，我们还表明，当患者历史窗口更接近出院或死亡时，预测性能更高。最后，我们证明了使用F1分数作为早期停止度量可以稳定和提高我们的方法对手头任务的性能。
摘要：Intensive Care Units usually carry patients with a serious risk of mortality. Recent research has shown the ability of Machine Learning to indicate the patients' mortality risk and point physicians toward individuals with a heightened need for care. Nevertheless, healthcare data is often subject to privacy regulations and can therefore not be easily shared in order to build Centralized Machine Learning models that use the combined data of multiple hospitals. Federated Learning is a Machine Learning framework designed for data privacy that can be used to circumvent this problem. In this study, we evaluate the ability of deep Federated Learning to predict the risk of Intensive Care Unit mortality at an early stage. We compare the predictive performance of Federated, Centralized, and Local Machine Learning in terms of AUPRC, F1-score, and AUROC. Our results show that Federated Learning performs equally well as the centralized approach and is substantially better than the local approach, thus providing a viable solution for early Intensive Care Unit mortality prediction. In addition, we show that the prediction performance is higher when the patient history window is closer to discharge or death. Finally, we show that using the F1-score as an early stopping metric can stabilize and increase the performance of our approach for the task at hand.

推理|分析|理解|解释(6篇)

【1】 Simplifying and Understanding State Space Models with Diagonal Linear RNNs
标题：用对角线性RNN简化和理解状态空间模型
链接：https://arxiv.org/abs/2212.00768

作者：Ankit Gupta,Harsh Mehta,Jonathan Berant
机构：Tel Aviv University, Google Research
摘要：基于线性状态空间（SSM）的序列模型最近已经成为一种有前途的架构选择，用于对各种模态之间的长期依赖性进行建模。然而，它们总是依赖于连续状态空间的离散化，这使得它们的表示和理解复杂化。在本文中，我们处理了离散化步骤，并提出了一个基于vanilla对角线性RNN（$\mathrm{DLR}$）的模型。我们的经验表明，$\mathrm{DLR}$在强监督下的性能与之前提出的SSM一样，尽管概念上要简单得多。此外，我们还通过一套价值13美元的合成序列到序列任务来表征SSM（包括$\mathrm{DLR}$）和基于注意力的模型的表现力，这些任务涉及数万个标记的交互，从简单的操作（如移动输入序列）到在扁平图像中检测长空间范围内的相互依赖的视觉特征。我们发现，SSM报告了可以通过$\textit{few}$卷积内核建模的任务的近乎完美的性能，但它们在需要$\textit{many}$这样的内核的任务上却很吃力，尤其是当所需的序列操作是$\textit{context-dependent}$时。例如，$\mathrm{DLR}$学习将长度为$0.5M$的输入完美地移位任意数量的位置，但是当移位大小取决于上下文时失败。尽管存在这些限制，$\mathrm{DLR}$在两个高阶推理任务$\mathrm{ListOpsSubTrees}$和$\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}$上仍达到了高性能，输入长度分别为$8K$和$65K$，并且在输入长度$262K$的$\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}$上给出了令人鼓舞的性能，对于该输入长度$262K$，注意力不是可行的选择。
摘要：Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs ($\mathrm{DLR}$). We empirically show that $\mathrm{DLR}$ is as performant as previously-proposed SSMs in the presence of strong supervision, despite being conceptually much simpler. Moreover, we characterize the expressivity of SSMs (including $\mathrm{DLR}$) and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via $\textit{few}$ convolutional kernels, they struggle on tasks requiring $\textit{many}$ such kernels and especially when the desired sequence manipulation is $\textit{context-dependent}$. For example, $\mathrm{DLR}$ learns to perfectly shift a $0.5M$-long input by an arbitrary number of positions but fails when the shift size depends on context. Despite these limitations, $\mathrm{DLR}$ reaches high performance on two higher-order reasoning tasks $\mathrm{ListOpsSubTrees}$ and $\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}$ with input lengths $8K$ and $65K$ respectively, and gives encouraging performance on $\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}$ with input length $262K$ for which attention is not a viable choice.

【2】 Explainable Artificial Intelligence for Improved Modeling of Processes
标题：用于改进过程建模的可解释人工智能
链接：https://arxiv.org/abs/2212.00695

作者：Riza Velioglu,Jan Philip Göpfert,André Artelt,Barbara Hammer
机构： CITEC, Bielefeld University, Bielefeld, Germany, Recommendy UG, Bielefeld, Germany
备注：None
摘要：None
摘要：In modern business processes, the amount of data collected has increased substantially in recent years. Because this data can potentially yield valuable insights, automated knowledge extraction based on process mining has been proposed, among other techniques, to provide users with intuitive access to the information contained therein. At present, the majority of technologies aim to reconstruct explicit business process models. These are directly interpretable but limited concerning the integration of diverse and real-valued information sources. On the other hand, Machine Learning (ML) benefits from the vast amount of data available and can deal with high-dimensional sources, yet it has rarely been applied to being used in processes. In this contribution, we evaluate the capability of modern Transformer architectures as well as more classical ML technologies of modeling process regularities, as can be quantitatively evaluated by their prediction capability. In addition, we demonstrate the capability of attentional properties and feature relevance determination by highlighting features that are crucial to the processes' predictive abilities. We demonstrate the efficacy of our approach using five benchmark datasets and show that the ML models are capable of predicting critical outcomes and that the attention mechanisms or XAI components offer new insights into the underlying processes.

【3】 Understanding the Energy Consumption of HPC Scale Artificial Intelligence
标题：理解高性能计算规模人工智能的能耗
链接：https://arxiv.org/abs/2212.00582

作者：Danilo Carastan dos Santos
备注：None
摘要：None
摘要：This paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI), and more specifically Deep Learning (DL) algorithms. For this task we developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments. We exploited hardware counters and Python libraries to collect energy information through software, which enabled us to instrument a known AI benchmark tool, and to evaluate the energy consumption of numerous DL algorithms and models. Through an experimental campaign, we show a case example of the potential of benchmark-tracker to measure the computing speed and the energy consumption for training and inference DL algorithms, and also the potential of Benchmark-Tracker to help better understanding the energy behavior of DL algorithms in HPC platforms. This work is a step forward to better understand the energy consumption of Deep Learning in HPC, and it also contributes with a new tool to help HPC DL developers to better balance the HPC infrastructure in terms of speed and energy consumption.

【4】 Deep neural network techniques for monaural speech enhancement: state of the art analysis
标题：深度神经网络技术在单声道语音增强中的应用现状分析
链接：https://arxiv.org/abs/2212.00369

作者：Peter Ochieng
机构：Department of Computer Science, University of Cambridge
备注：conference
摘要：深度神经网络（DNN）技术已经在诸如自然语言处理和计算机视觉等领域变得普遍。他们在机器翻译和图像生成等任务中取得了巨大的成功。由于它们的成功，这些数据驱动技术已经应用于音频领域。具体地说，DNN模型已经应用于语音增强领域，以实现单耳语音增强中的去噪、去混响和多说话人分离。本文综述了几种主要的用于语音分离的DNN技术。综述了从特征提取到基于DNN的语音增强工具如何对语音的全局和局部特征建模以及模型训练（监督和非监督）的整个过程。本文还综述了语音增强预训练模型在语音增强中的应用。该综述旨在涵盖DNN应用于单个说话人语音增强的主要趋势。
摘要：Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in these domains in task such as machine translation and image generation. Due to their success, these data driven techniques have been applied in audio domain. More specifically, DNN models have been applied in speech enhancement domain to achieve denosing, dereverberation and multi-speaker separation in monaural speech enhancement. In this paper, we review some dominant DNN techniques being employed to achieve speech separation. The review looks at the whole pipeline of speech enhancement from feature extraction, how DNN based tools are modelling both global and local features of speech and model training (supervised and unsupervised). We also review the use of speech-enhancement pre-trained models to boost speech enhancement process. The review is geared towards covering the dominant trends with regards to DNN application in speech enhancement in speech obtained via a single speaker.

【5】 Location analysis of players in UEFA EURO 2020 and 2022 using generalized valuation of defense by estimating probabilities
标题：用概率法对2020年和2022年欧洲杯球员的防守进行综合评估
链接：https://arxiv.org/abs/2212.00021

作者：Rikuhei Umemoto,Kazushi Tsutsui,Keisuke Fujii
机构：Graduate School of Informatics, Nagoya University, Nagoya, Aichi, Japan; bRIKEN Center, for Advanced Intelligence Project, Fukuoka, Fukuoka, Japan; cPRESTO, Japan Science and, Technology Agency, Kawaguchi, Saitama, Japan, ARTICLE HISTORY
备注：16 pages, 8 figures
摘要：分析团队运动中的防守通常是具有挑战性的，因为事件数据有限。研究人员先前提出了通过使用所有球员和球的位置来预测获得球和被攻击的事件来评估足球队防守的方法。但是，他们没有考虑到事件的重要性，假设了所有22名球员的完美观察，也没有充分调查多样性的影响（例如，国籍和性别）。在此，我们提出了一种通过对事件的预测概率进行评分来评估防守球队的方法。利用2020年欧洲杯男子和2022年欧洲杯女子足球比赛的广播视频帧中所有球员的开源位置数据，考察了球员数量对预测的影响，并通过对比赛的分析验证了该方法的有效性.结果表明：对于被攻击、得分和失球的预测，不需要所有球员的信息，而对得球的预测需要3 ~ 4名攻防球员的信息。通过比赛分析，我们解释了2020年欧洲杯决赛球队在防守方面的出色表现。我们的方法可以适用于来自足球比赛中的广播视频帧的位置数据。
摘要：Analyzing defenses in team sports is generally challenging because of the limited event data. Researchers have previously proposed methods to evaluate football team defense by predicting the events of ball gain and being attacked using locations of all players and the ball. However, they did not consider the importance of the events, assumed the perfect observation of all 22 players, and did not fully investigated the influence of the diversity (e.g., nationality and sex). Here, we propose a generalized valuation method of defensive teams by score-scaling the predicted probabilities of the events. Using the open-source location data of all players in broadcast video frames in football games of men's Euro 2020 and women's Euro 2022, we investigated the effect of the number of players on the prediction and validated our approach by analyzing the games. Results show that for the predictions of being attacked, scoring, and conceding, all players' information was not necessary, while that of ball gain required information on three to four offensive and defensive players. With game analyses we explained the excellence in defense of finalist teams in Euro 2020. Our approach might be applicable to location data from broadcast video frames in football games.

【6】 Shining light on data: Geometric data analysis through quantum dynamics
标题：照亮数据：通过量子动力学进行几何数据分析
链接：https://arxiv.org/abs/2212.00682

作者：Akshat Kumar,Mohan Sarovar
机构：Department of Mathematics, Clarkson University, Potsdam, NY , USA, Instituto de Telecomunicac¸˜oes, Lisbon, Portugal, Quantum Algorithms and Applications Collaboratory, Sandia National Laboratories, Livermore, California , USA
备注：Supplementary Material has high overlap with arXiv:2112.11161 by the same authors
摘要：None
摘要：Experimental sciences have come to depend heavily on our ability to organize, interpret and analyze high-dimensional datasets produced from observations of a large number of variables governed by natural processes. Natural laws, conservation principles, and dynamical structure introduce intricate inter-dependencies among these observed variables, which in turn yield geometric structure, with fewer degrees of freedom, on the dataset. We show how fine-scale features of this structure in data can be extracted from \emph{discrete} approximations to quantum mechanical processes given by data-driven graph Laplacians and localized wavepackets. This data-driven quantization procedure leads to a novel, yet natural uncertainty principle for data analysis induced by limited data. We illustrate the new approach with algorithms and several applications to real-world data, including the learning of patterns and anomalies in social distancing and mobility behavior during the COVID-19 pandemic.

检测相关(3篇)

【1】 Attribute-based Representations for Accurate and Interpretable Video Anomaly Detection
标题：基于属性表示的准确可解释视频异常检测
链接：https://arxiv.org/abs/2212.00789

作者：Tal Reiss,Yedid Hoshen
机构：School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
备注：Our code is available at this https URL
摘要：None
摘要：Video anomaly detection (VAD) is a challenging computer vision task with many practical applications. As anomalies are inherently ambiguous, it is essential for users to understand the reasoning behind a system's decision in order to determine if the rationale is sound. In this paper, we propose a simple but highly effective method that pushes the boundaries of VAD accuracy and interpretability using attribute-based representations. Our method represents every object by its velocity and pose. The anomaly scores are computed using a density-based approach. Surprisingly, we find that this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the largest and most complex VAD dataset. Combining our interpretable attribute-based representations with implicit, deep representation yields state-of-the-art performance with a $99.1\%, 93.3\%$, and $85.9\%$ AUROC on Ped2, Avenue, and ShanghaiTech, respectively. Our method is accurate, interpretable, and easy to implement.

【2】 Soft Labels for Rapid Satellite Object Detection
标题：一种用于快速卫星目标检测的软标签
链接：https://arxiv.org/abs/2212.00585

作者：Matthew Ciolino,Grant Rosario,David Noever
机构：PeopleTec, Inc, Corporate Dr NW, Huntsville, AL , USA
备注：4 Pages, 3 Figures, 2 Tables, 21 References
摘要：图像分类中的软标签是图像真实分类的矢量表示。本文研究了卫星目标检测中的软标签问题。我们建议使用检测作为软标签的新数据集的基础。创建高质量模型的大部分工作是收集和注释定型数据。如果我们可以使用模型为我们生成数据集，我们不仅可以快速创建数据集，还可以补充现有的开源数据集。使用xView数据集的子集，我们训练YOLOv5模型来检测汽车、飞机和船只。然后，我们使用该模型为第二训练集生成软标签，然后我们训练该第二训练集并将其与原始模型进行比较。我们证明了软标签可以用来训练一个模型，该模型几乎与在原始数据上训练的模型一样精确。
摘要：Soft labels in image classification are vector representations of an image's true classification. In this paper, we investigate soft labels in the context of satellite object detection. We propose using detections as the basis for a new dataset of soft labels. Much of the effort in creating a high-quality model is gathering and annotating the training data. If we could use a model to generate a dataset for us, we could not only rapidly create datasets, but also supplement existing open-source datasets. Using a subset of the xView dataset, we train a YOLOv5 model to detect cars, planes, and ships. We then use that model to generate soft labels for the second training set which we then train and compare to the original model. We show that soft labels can be used to train a model that is almost as accurate as a model trained on the original data.

【3】 Edge Deep Learning Enabled Freezing of Gait Detection in Parkinson's Patients
标题：边缘深度学习使帕金森患者的步态检测冻结
链接：https://arxiv.org/abs/2212.00729

作者：Ourong Lin,Tian Yu,Yuhan Hou,Yi Zhu,Xilin Liu
机构：∗Department of Electrical and Computer Engineering (ECE), University of Toronto, Toronto, ON, Canada, †Toronto Rehabilitation Institute, University Health Network (UHN), Toronto, ON, Canada
摘要：提出了一种用于帕金森病患者步态冻结（FoG）症状检测和报警的无线传感器网络的设计。三个传感器节点，每个都集成了一个3轴加速度计，可以放置在患者的脚踝、大腿和躯干上。每个传感器节点都可以使用设备上深度学习（DL）模型独立检测FoG，该模型采用挤压和激励卷积神经网络（CNN）。在使用公共数据集的验证中，所开发的原型实现了88.8%的FoG检测灵敏度和85.34%的F1得分，每个传感器节点使用少于20k个可训练参数。一旦检测到FoG，将产生听觉信号提醒用户，如果需要，报警信号也将发送到手机以采取进一步行动。传感器节点可以通过电感耦合轻松地无线充电。该系统是独立的，在本地处理所有用户数据，而无需将数据流传输到外部设备或云，因此消除了与无线数据传输相关的网络安全风险和功耗损失。所开发的方法可以在广泛的应用中使用。
摘要：This paper presents the design of a wireless sensor network for detecting and alerting the freezing of gait (FoG) symptoms in patients with Parkinson's disease. Three sensor nodes, each integrating a 3-axis accelerometer, can be placed on a patient at ankle, thigh, and truck. Each sensor node can independently detect FoG using an on-device deep learning (DL) model, featuring a squeeze and excitation convolutional neural network (CNN). In a validation using a public dataset, the prototype developed achieved a FoG detection sensitivity of 88.8% and an F1 score of 85.34%, using less than 20 k trainable parameters per sensor node. Once FoG is detected, an auditory signal will be generated to alert users, and the alarm signal will also be sent to mobile phones for further actions if needed. The sensor node can be easily recharged wirelessly by inductive coupling. The system is self-contained and processes all user data locally without streaming data to external devices or the cloud, thus eliminating the cybersecurity risks and power penalty associated with wireless data transmission. The developed methodology can be used in a wide range of applications.

分类|识别(2篇)

【1】 High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization
标题：标签移位下的高维二值分类：相变和正则化
链接：https://arxiv.org/abs/2212.00700

作者：Jiahui Cheng,Minshuo Chen,Hao Liu,Tuo Zhao,Wenjing Liao
机构：Jiahui Cheng and Wenjing Liao are affiliatedwith the School of Mathematics at Georgia Institute of Technology, Minshuo Chen is affiliated with the Department ofElectrical and Computer Engineering at Princeton University
摘要：None
摘要：Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.

【2】 MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
标题：MMSpeech：用于语音识别的多模式多任务编解码器预训练
链接：https://arxiv.org/abs/2212.00500

作者：Xiaohuan Zhou,Jiaming Wang,Zeyu Cui,Shiliang Zhang,Zhijie Yan,Jingren Zhou,Chang Zhou
机构：DAMO Academy, Alibaba Group, China
备注：Submitted to ICASSP 2023
摘要：本文提出了一种新的多模态多任务编解码器预训练框架（MMSpeech）用于汉语自动语音识别（ASR），该框架同时使用未标记语音和文本数据.语音-文本联合预训练的主要困难来自语音和文本模态的显著差异，特别是汉语语音和文本。与英语和其他使用字母书写系统的语言不同，普通话使用表意书写系统，其中字符和声音并不紧密地相互映射。因此，我们建议将音素模态引入预训练，以帮助捕获普通话语音和文本之间的模态不变信息。具体地说，我们采用了一个多任务学习框架，包括五个自我监督和监督的任务，语音和文本数据。对于端到端的预训练，我们引入了利用未标记语音和文本数据的自监督语音-伪码（S2 C）和音素-文本（P2 T）任务，其中语音-伪码对和音素-文本对是对监督语音-文本对的补充。为了训练编码器学习更好的语音表示，我们引入了自监督掩蔽语音预测（MSP）和监督音素预测（PP）任务来学习将语音映射到音素。此外，在预训练过程中直接加入下游有监督的语音到文本（S2 T）任务，进一步提高了预训练性能，无需微调即可获得更好的识别效果.在AISHELL-1上的实验结果表明，该方法与其他预训练方法相比，相对提高了40%以上，达到了最佳的性能。
摘要：In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both unlabeled speech and text data. The main difficulty in speech-text joint pre-training comes from the significant difference between speech and text modalities, especially for Mandarin speech and text. Unlike English and other languages with an alphabetic writing system, Mandarin uses an ideographic writing system where character and sound are not tightly mapped to one another. Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text. Specifically, we employ a multi-task learning framework including five self-supervised and supervised tasks with speech and text data. For end-to-end pre-training, we introduce self-supervised speech-to-pseudo-codes (S2C) and phoneme-to-text (P2T) tasks utilizing unlabeled speech and text data, where speech-pseudo-codes pairs and phoneme-text pairs are a supplement to the supervised speech-text pairs. To train the encoder to learn better speech representation, we introduce self-supervised masked speech prediction (MSP) and supervised phoneme prediction (PP) tasks to learn to map speech into phonemes. Besides, we directly add the downstream supervised speech-to-text (S2T) task into the pre-training process, which can further improve the pre-training performance and achieve better recognition results even without fine-tuning. Experiments on AISHELL-1 show that our proposed method achieves state-of-the-art performance, with a more than 40% relative improvement compared with other pre-training methods.

表征(3篇)

【1】 Neural Representations Reveal Distinct Modes of Class Fitting in Residual Convolutional Networks
标题：神经表示揭示了剩余卷积网络中类拟合的不同模式
链接：https://arxiv.org/abs/2212.00771

作者：Michał Jamroż,Marcin Kurdziel
机构： AGH University of Science and Technology, Krakow, Poland
备注：Accepted to AAAI 2023. Extended version
摘要：我们利用神经表示的概率模型来研究残差网络如何拟合类。为此，我们估计了深度ResNet学习的表示的类条件密度模型。然后，我们使用这些模型来表征表征在学习类之间的分布。令人惊讶的是，我们发现所研究的模型中的类别并不是以统一的方式拟合的。相反地：我们揭示了两组具有显著不同的表示分布的类。这些不同的类拟合模式仅在所研究模型的较深层中明显，表明它们与低级别图像特征无关。我们表明，神经表征中的未覆盖结构与训练样本的记忆和对抗鲁棒性相关。最后，我们比较了记忆样本和典型样本之间神经表征的类条件分布。这使得我们能够发现在网络结构中，记忆和标准输入的类别标签出现在哪里。
摘要：We leverage probabilistic models of neural representations to investigate how residual networks fit classes. To this end, we estimate class-conditional density models for representations learned by deep ResNets. We then use these models to characterize distributions of representations across learned classes. Surprisingly, we find that classes in the investigated models are not fitted in an uniform way. On the contrary: we uncover two groups of classes that are fitted with markedly different distributions of representations. These distinct modes of class-fitting are evident only in the deeper layers of the investigated models, indicating that they are not related to low-level image features. We show that the uncovered structure in neural representations correlate with memorization of training examples and adversarial robustness. Finally, we compare class-conditional distributions of neural representations between memorized and typical examples. This allows us to uncover where in the network structure class labels arise for memorized and standard inputs.

【2】 Hyperbolic Contrastive Learning for Visual Representations beyond Objects
标题：物体外视觉表征的双曲对比学习
链接：https://arxiv.org/abs/2212.00653

作者：Songwei Ge,Shlok Mishra,Simon Kornblith,Chun-Liang Li,David Jacobs
机构：University of Maryland, College Park,Google Research
摘要：尽管自监督/非监督方法已经导致视觉表征学习的快速进展，但是这些方法通常使用相同的镜头来处理对象和场景。在这篇论文中，我们着重于学习物体和场景的表示，以保持它们之间的结构。基于视觉相似的物体在表征空间中是接近的这一观察结果，我们认为场景和物体应该遵循基于其组合性的层次结构。为了利用这样的结构，我们提出了一种对比学习框架，其中使用欧几里得损失来学习对象表示，并且使用双曲线损失来鼓励场景的表示在双曲线空间中靠近它们的组成对象的表示。这种新颖的双曲目标通过优化它们的范数的大小来鼓励场景-对象的上位关系。我们表明，当在COCO和OpenImages数据集上进行预训练时，双曲线损失改善了跨多个数据集和任务的多个基线的下游性能，包括图像分类、对象检测和语义分割。我们还表明，学习表征的性质允许我们以zero-shot方式解决涉及场景和物体之间交互的各种视觉任务。我们的代码可以在以下位置找到：
摘要：Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance of several baselines across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot fashion. Our code can be found at \url{https://github.com/shlokk/HCL/tree/main/HCL}.

【3】 Low-Rank Tensor Function Representation for Multi-Dimensional Data Recovery
标题：用于多维数据恢复的低阶张量函数表示
链接：https://arxiv.org/abs/2212.00262

作者：Yisi Luo,Xile Zhao,Zhemin Li,Michael K. Ng,Deyu Meng
机构：•Xile Zhao is with the School of Mathematical Sciences, University ofElectronic Science and Technology of China
摘要：由于高阶张量自然适合于表示真实世界中的多维数据，彩色图像和视频的低秩张量表示已经成为机器学习和计算机视觉的新兴领域之一。然而，经典的低秩张量表示由于其固有的离散性而只能表示有限meshgrid上的数据，这阻碍了其在meshgrid之外的许多场景中的潜在应用。为了突破这一障碍，提出了一种低秩张量函数表示（LRTFR），它可以连续表示网格之外的无限分辨率数据。具体地，所建议的张量函数将任意坐标映射到对应值，可以连续地表示无限实空间中的数据。与离散张量平行，我们发展了张量函数的两个基本概念，即：张量函数秩和低秩张量函数因式分解。从理论上证明了LRTFR中低秩正则化和光滑正则化的和谐统一，使得数据连续表示具有较高的有效性和效率.图像处理（图像修复和去噪）、机器学习（超参数优化）和计算机图形学（点云上采样）等领域的多维数据恢复应用证明了该方法相对于现有方法的优越性和通用性.特别是在超出原始网格分辨率（超参数优化）甚至超出网格分辨率（点云上采样）的实验中，验证了该方法在连续表示方面的良好性能。
摘要：Since higher-order tensors are naturally suitable for representing multi-dimensional data in real-world, e.g., color images and videos, low-rank tensor representation has become one of the emerging areas in machine learning and computer vision. However, classical low-rank tensor representations can only represent data on finite meshgrid due to their intrinsical discrete nature, which hinders their potential applicability in many scenarios beyond meshgrid. To break this barrier, we propose a low-rank tensor function representation (LRTFR), which can continuously represent data beyond meshgrid with infinite resolution. Specifically, the suggested tensor function, which maps an arbitrary coordinate to the corresponding value, can continuously represent data in an infinite real space. Parallel to discrete tensors, we develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization. We theoretically justify that both low-rank and smooth regularizations are harmoniously unified in the LRTFR, which leads to high effectiveness and efficiency for data continuous representation. Extensive multi-dimensional data recovery applications arising from image processing (image inpainting and denoising), machine learning (hyperparameter optimization), and computer graphics (point cloud upsampling) substantiate the superiority and versatility of our method as compared with state-of-the-art methods. Especially, the experiments beyond the original meshgrid resolution (hyperparameter optimization) or even beyond meshgrid (point cloud upsampling) validate the favorable performances of our method for continuous representation.

优化|敛散性(3篇)

【1】 Prasatul Matrix: A Direct Comparison Approach for Analyzing Evolutionary Optimization Algorithms
标题：Prasatul矩阵：一种分析进化优化算法的直接比较方法
链接：https://arxiv.org/abs/2212.00671

作者：Anupam Biswas
机构： Biswas is with the Department of Computer Science and Engineering, National Institute of Technology Silchar
摘要：个体进化优化算法的性能主要是用均值、中值和标准差等统计量来衡量的，在用该算法的少量试验获得的最佳解上计算。为了比较两种算法的性能，比较这些统计量的值而不是直接比较解。这种比较缺乏对用不同算法获得的解的直接比较。例如，两个算法的最佳解（或最差解）的比较根本不可能。此外，尽管算法的收敛性也是一个重要因素，但算法的排名大多仅根据解的质量来进行。本文提出了一种直接比较的方法来分析进化优化算法的性能。准备了一个称为\emph{Prasatul Matrix}的直接比较矩阵，该矩阵说明了在特定试验次数下使用两种算法获得的最佳解决方案的直接比较结果。基于Prasatul矩阵，设计了5种不同的性能指标，从解的最优性和可比性两个方面来评价算法的性能。这些分数被用来开发一种分数驱动的方法，用于比较多个算法的性能以及在解质量和收敛性分析的基础上进行排序。在25个基准函数上，用6种进化优化算法对所提方法进行了分析。还进行了非参数统计分析，即Wilcoxon配对秩和检验，以验证所提出的直接比较方法的结果。
摘要：The performance of individual evolutionary optimization algorithms is mostly measured in terms of statistics such as mean, median and standard deviation etc., computed over the best solutions obtained with few trails of the algorithm. To compare the performance of two algorithms, the values of these statistics are compared instead of comparing the solutions directly. This kind of comparison lacks direct comparison of solutions obtained with different algorithms. For instance, the comparison of best solutions (or worst solution) of two algorithms simply not possible. Moreover, ranking of algorithms is mostly done in terms of solution quality only, despite the fact that the convergence of algorithm is also an important factor. In this paper, a direct comparison approach is proposed to analyze the performance of evolutionary optimization algorithms. A direct comparison matrix called \emph{Prasatul Matrix} is prepared, which accounts direct comparison outcome of best solutions obtained with two algorithms for a specific number of trials. Five different performance measures are designed based on the prasatul matrix to evaluate the performance of algorithms in terms of Optimality and Comparability of solutions. These scores are utilized to develop a score-driven approach for comparing performance of multiple algorithms as well as for ranking both in the grounds of solution quality and convergence analysis. Proposed approach is analyzed with six evolutionary optimization algorithms on 25 benchmark functions. A non-parametric statistical analysis, namely Wilcoxon paired sum-rank test is also performed to verify the outcomes of proposed direct comparison approach.

【2】 Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP
标题：基于近似样本最优约简的平均报酬MDP策略学习
链接：https://arxiv.org/abs/2212.00603

作者：Jinghan Wang,Mengdi Wang,Lin F. Yang
机构：Peking University, Princeton University, University of California, Los Angeles
摘要：本工作考虑了在平均报酬马尔可夫决策过程（AMDP）中获得最优策略的样本复杂性，给出了对生成模型（模拟器）的访问。当地面实况MDP是弱通信时，我们证明了每个状态-动作对的$\widetilde O（H \varepsilon^{-3} \ln \frac{1}{\delta}）$个样本的上界，其中$H：= sp（h^*）$是任何最优策略的偏差跨度，$\varepsilon$是准确度，$\delta$是失败概率。这个界改进了[Jin & Sidford 2021]中最著名的基于混合时间的方法，该方法假设每个确定性策略的混合时间是有界的。本文的核心是从AMDP问题到折扣MDP（DMDP）问题的一个适当的约简界，这可能是独立的兴趣，因为它允许DMDP算法在其他环境中应用于AMDP问题。我们通过证明$\Omega（|\数学S| |A|H \varepsilon^{-2} \ln \frac{1}{\delta}）$个总样本，表明线性依赖于$H$是必要的，并且我们的上限与$（|\数学S|，|A|，H，\ln \frac{1}{\delta}）$最多为一些对数因子。
摘要：This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator). When the ground-truth MDP is weakly communicating, we prove an upper bound of $\widetilde O(H \varepsilon^{-3} \ln \frac{1}{\delta})$ samples per state-action pair, where $H := sp(h^*)$ is the span of bias of any optimal policy, $\varepsilon$ is the accuracy and $\delta$ is the failure probability. This bound improves the best-known mixing-time-based approaches in [Jin & Sidford 2021], which assume the mixing-time of every deterministic policy is bounded. The core of our analysis is a proper reduction bound from AMDP problems to discounted MDP (DMDP) problems, which may be of independent interests since it allows the application of DMDP algorithms for AMDP in other settings. We complement our upper bound by proving a minimax lower bound of $\Omega(|\mathcal S| |\mathcal A| H \varepsilon^{-2} \ln \frac{1}{\delta})$ total samples, showing that a linear dependent on $H$ is necessary and that our upper bound matches the lower bound in all parameters of $(|\mathcal S|, |\mathcal A|, H, \ln \frac{1}{\delta})$ up to some logarithmic factors.

【3】 Second-order optimization with lazy Hessians
标题：懒惰黑森人的二阶优化问题
链接：https://arxiv.org/abs/2212.00781

作者：Nikita Doikov,El Mahdi Chayti,Martin Jaggi
机构：EPFL
摘要：None
摘要：We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetical complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.

预测|估计(4篇)

【1】 High-dimensional density estimation with tensorizing flow
标题：基于张力化流动的高维密度估计
链接：https://arxiv.org/abs/2212.00759

作者：Yinuo Ren,Hongli Zhao,Yuehaw Khoo,Lexing Ying
机构：Institute for Computational and Mathematical Engineering (ICME), Stanford University, Stanford, Department of Statistics, University of Chicago, Chicago, IL , USA, Department of Mathematics, Stanford University, Stanford, CA , USA
摘要：本文提出了一种基于观测数据的高维概率密度函数的张量流估计方法。该方法基于张量序列和基于流的生成式建模。该方法首先基于低维边缘的核密度估计，通过求解线性系统的张量核，有效地构造了张量序列形式的近似密度.然后，我们通过执行最大似然估计，从该张量序列密度到观察到的经验分布来训练连续时间流模型。该方法结合了张量序列的无优化特性和基于流的生成式模型的灵活性。数值结果验证了所提方法的有效性.
摘要：We propose the tensorizing flow method for estimating high-dimensional probability density functions from the observed data. The method is based on tensor-train and flow-based generative modeling. Our method first efficiently constructs an approximate density in the tensor-train form via solving the tensor cores from a linear system based on the kernel density estimators of low-dimensional marginals. We then train a continuous-time flow model from this tensor-train density to the observed empirical distribution by performing a maximum likelihood estimation. The proposed method combines the optimization-less feature of the tensor-train with the flexibility of the flow-based generative models. Numerical results are included to demonstrate the performance of the proposed method.

【2】 Incremental Predictive Coding: A Parallel and Fully Automatic Learning Algorithm
标题：增量预测编码：一种并行全自动学习算法
链接：https://arxiv.org/abs/2212.00720

作者：Tommaso Salvatori,Yuhang Song,Beren Millidge,Zhenghua Xu,Lei Sha,Cornelius Emde,Rafal Bogacz,Thomas Lukasiewicz
机构：Department of Computer Science, University of Oxford, UK, MRC Brain Network Dynamics Unit, University of Oxford, UK, State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University, China
备注：18 pages, 7 figures
摘要：受神经科学启发的模型，如预测编码，有可能在机器智能的未来发挥重要作用。然而，由于一些限制，例如缺乏效率，它们还没有在工业应用中使用。在本研究中，我们通过提出增量预测编码（iPC）来解决这一问题，iPC是从增量期望最大化算法衍生的原始框架的变体，其中每个操作都可以在没有外部控制的情况下并行执行。我们从理论和实验两个方面证明，iPC比Rao和Ballard最初开发的原始算法快得多，同时在图像分类任务中保持与反向传播相当的性能。这项工作影响了多个领域，在计算神经科学和机器学习中有一般应用，在自动化和并行化很重要的场景中有特定应用，例如分布式计算和在模拟和神经形态芯片上实现深度学习模型。
摘要：Neuroscience-inspired models, such as predictive coding, have the potential to play an important role in the future of machine intelligence. However, they are not yet used in industrial applications due to some limitations, such as the lack of efficiency. In this work, we address this by proposing incremental predictive coding (iPC), a variation of the original framework derived from the incremental expectation maximization algorithm, where every operation can be performed in parallel without external control. We show both theoretically and empirically that iPC is much faster than the original algorithm originally developed by Rao and Ballard, while maintaining performance comparable to backpropagation in image classification tasks. This work impacts several areas, has general applications in computational neuroscience and machine learning, and specific applications in scenarios where automatization and parallelization are important, such as distributed computing and implementations of deep learning models on analog and neuromorphic chips.

【3】 Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation
标题：基于核密度估计的核矩阵次二次算法
链接：https://arxiv.org/abs/2212.00642

作者：Ainesh Bakshi,Piotr Indyk,Praneeth Kacham,Sandeep Silwal,Samson Zhou
机构：MIT, CMU, UC Berkeley and Rice University
摘要：None
摘要：Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.

【4】 Deep Kernel Learning for Mortality Prediction in the Face of Temporal Shift
标题：基于深度核学习的时间漂移死亡率预测
链接：https://arxiv.org/abs/2212.00557

作者：Miguel Rios,Ameen Abu-Hanna
机构：Centre for Translation Studies, University of Vienna, Department of Medical Informatics, Amsterdam UMC, University of Amsterdam
备注：Rios, M., Abu-Hanna, A. (2021). Deep Kernel Learning for Mortality Prediction in the Face of Temporal Shift. In: Tucker, A., Henriques Abreu, P., Cardoso, J., Pereira Rodrigues, P., Ria\~no, D. (eds) Artificial Intelligence in Medicine. AIME 2021. Lecture Notes in Computer Science(), vol 12721. Springer, Cham. this https URL
摘要：None
摘要：Neural models, with their ability to provide novel representations, have shown promising results in prediction tasks in healthcare. However, patient demographics, medical technology, and quality of care change over time. This often leads to drop in the performance of neural models for prospective patients, especially in terms of their calibration. The deep kernel learning (DKL) framework may be robust to such changes as it combines neural models with Gaussian processes, which are aware of prediction uncertainty. Our hypothesis is that out-of-distribution test points will result in probabilities closer to the global mean and hence prevent overconfident predictions. This in turn, we hypothesise, will result in better calibration on prospective data. This paper investigates DKL's behaviour when facing a temporal shift, which was naturally introduced when an information system that feeds a cohort database was changed. We compare DKL's performance to that of a neural baseline based on recurrent neural networks. We show that DKL indeed produced superior calibrated predictions. We also confirm that the DKL's predictions were indeed less sharp. In addition, DKL's discrimination ability was even improved: its AUC was 0.746 (+- 0.014 std), compared to 0.739 (+- 0.028 std) for the baseline. The paper demonstrated the importance of including uncertainty in neural computing, especially for their prospective use.

其他神经网络|深度学习|模型|建模(19篇)

【1】 Learning Transition Operators From Sparse Space-Time Samples
标题：从稀疏时空样本中学习转移算子
链接：https://arxiv.org/abs/2212.00746

作者：Christian Kümmerle,Mauro Maggioni,Sui Tang
机构： Maggioni is with the Department of Mathematics and the Departmentof Applied Mathematics & Statistics, Johns Hopkins University
备注：34 pages, 12 figures
摘要：我们考虑从不同时刻的部分观测值学习转移算子$\mathbf{A}$的非线性逆问题，特别是从其幂$\mathbf{A}，\mathbf{A} ^2，\cdots，\mathbf {A}^{T}$的稀疏观测值学习转移算子$\mathbf {A}$。这种时空转换算子恢复问题是由最近对学习时变图信号的兴趣所激发的，所述时变图信号由依赖于底层图拓扑的图算子驱动。我们通过将问题嵌入到合适的块汉克尔矩阵的高维空间中来解决问题的非线性，在那里它变成低秩矩阵完成问题，即使$\mathbf{A}$是满秩的。对于均匀和自适应随机时空采样模型，我们通过适当度量这些块Hankel嵌入矩阵的不相干性来量化转移算子的可恢复性.对于图转移算子，这些不一致性度量取决于动力学和图拓扑之间的相互作用。我们开发了一个合适的非凸迭代重加权最小二乘（IRLS）算法，建立了它的二次局部收敛性，并表明，在最优情况下，不超过$\mathcal{O}（rn \log（nT））$的时空样本足以确保大小为$n \times n$的秩-$r$算子$\mathbf{A}$的精确恢复。这建立了空间样本可以由可比较数量的空间-时间样本代替。我们给出了一个IRLS算法的有效实现，其空间复杂度为$O（rnT）$，每次迭代的时间复杂度为$n$的线性。基于几种图模型的转移算子的数值实验表明，理论结果准确地跟踪了经验相变，说明了所提算法的适用性和可扩展性.
摘要：We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.

【2】 Launchpad: Learning to Schedule Using Offline and Online RL Methods
标题：LaunchPad：学习使用离线和在线RL方法安排日程
链接：https://arxiv.org/abs/2212.00639

作者：Vanamala Venkataswamy,Jake Grigsby,Andrew Grimshaw,Yanjun Qi
机构：University of Virginia, Charlottesville, Virginia , USA, University of Texas at Austin, Austin, Texas , USA, Lancium Compute, Charlottesville, Virginia , USA
摘要：深度强化学习算法已经成功应用于多个具有挑战性的领域。经典的在线RL作业调度器可以学习有效的调度策略，但通常需要数千个时间步来探索环境并适应随机初始化的DNN策略。现有RL调度器忽视了从历史数据学习和改进自定义启发式策略的重要性。离线强化学习是一种基于预先记录的数据集，无需在线环境交互的策略优化方法.继最近数据驱动学习的成功之后，我们探索了两种RL方法：1）行为克隆（Behaviour Cloning）和2）离线RL，其目的在于从记录的数据中学习策略而不与环境交互。这些方法解决了关于数据收集成本和安全性的挑战，特别是与RL的真实世界应用相关的挑战。虽然数据驱动的RL方法产生了良好的结果，但我们表明性能高度依赖于历史数据集的质量。最后，我们证明了通过有效地结合先前的专家演示来预训练代理，我们缩短了随机探索阶段，通过在线训练来学习合理的策略。我们利用Offline RL作为\textbf {launchpad}，从使用Oracle或启发式策略收集的经验中学习有效的调度策略。这样的框架对于从历史数据集进行预训练是有效的，并且非常适合于通过在线数据收集进行持续改进。
摘要：Deep reinforcement learning algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often takes thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Existing RL schedulers overlook the importance of learning from historical data and improving upon custom heuristic policies. Offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. Following the recent success of data-driven learning, we explore two RL methods: 1) Behaviour Cloning and 2) Offline RL, which aim to learn policies from logged data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL. Although the data-driven RL methods generate good results, we show that the performance is highly dependent on the quality of the historical datasets. Finally, we demonstrate that by effectively incorporating prior expert demonstrations to pre-train the agent, we short-circuit the random exploration phase to learn a reasonable policy with online training. We utilize Offline RL as a \textbf{launchpad} to learn effective scheduling policies from prior experience collected using Oracle or heuristic policies. Such a framework is effective for pre-training from historical datasets and well suited to continuous improvement with online data collection.

【3】 Probably Approximate Shapley Fairness with Applications in Machine Learning
标题：近似Shapley公平性及其在机器学习中的应用
链接：https://arxiv.org/abs/2212.00630

作者：Zijian Zhou,Xinyi Xu,Rachael Hwee Ling Sim,Chuan Sheng Foo,Kian Hsiang Low
机构：Department of Computer Science, National University of Singapore, Singapore, Institute for Infocomm Research, ASTAR, Singapore
备注：37th AAAI Conference on Artificial Intelligence (AAAI 2023)
摘要：None
摘要：The Shapley value (SV) is adopted in various scenarios in machine learning (ML), including data valuation, agent valuation, and feature attribution, as it satisfies their fairness requirements. However, as exact SVs are infeasible to compute in practice, SV estimates are approximated instead. This approximation step raises an important question: do the SV estimates preserve the fairness guarantees of exact SVs? We observe that the fairness guarantees of exact SVs are too restrictive for SV estimates. Thus, we generalise Shapley fairness to probably approximate Shapley fairness and propose fidelity score, a metric to measure the variation of SV estimates, that determines how probable the fairness guarantees hold. Our last theoretical contribution is a novel greedy active estimation (GAE) algorithm that will maximise the lowest fidelity score and achieve a better fairness guarantee than the de facto Monte-Carlo estimation. We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.

【4】 Rethinking Two Consensuses of the Transferability in Deep Learning
标题：关于深度学习可转移性的两种共识的再思考
链接：https://arxiv.org/abs/2212.00399

作者：Yixiong Chen,Jingxian Li,Chris Ding,Li Liu
机构： The Chinese University of Hong Kong - Shenzhen, Shenzhen, China, Fudan University, Shanghai, China, Shenzhen Research Institute of Big Data, Shenzhen, China
摘要：深度迁移学习（DTL）已经形成了一个长期的追求，即使深度神经网络（DNN）能够像人类一样高效地重用历史经验。这种能力被称为知识转移能力。DTL的一个常用范例是首先学习一般知识（预训练），然后针对特定目标任务重用（微调）它们。关于预训练DNN的可移植性有两个共识：（1）预训练和下游数据之间较大的域间隙带来较低的可转移性;（2）可转移性从较低层（接近输入）到较高层（接近输出）逐渐降低。但这些共识基本上都是基于自然图像的实验得出的，这限制了它们的适用范围。本文旨在从更广阔的角度对它们进行研究和补充，提出一种测量预先训练好的DNN参数可迁移性的方法。我们在12个不同的图像分类数据集上进行了实验，得到了与前人一致的结论.更重要的是，提出了两个新的发现，(1)下游目标任务的数据量大、数据集多样性大，除了领域鸿沟外，也限制了可移植性;（2）虽然较低层学习基本图像特征，但是由于它们的域敏感性，它们通常不是最可转移的层。
摘要：Deep transfer learning (DTL) has formed a long-term quest toward enabling deep neural networks (DNNs) to reuse historical experiences as efficiently as humans. This ability is named knowledge transferability. A commonly used paradigm for DTL is firstly learning general knowledge (pre-training) and then reusing (fine-tuning) them for a specific target task. There are two consensuses of transferability of pre-trained DNNs: (1) a larger domain gap between pre-training and downstream data brings lower transferability; (2) the transferability gradually decreases from lower layers (near input) to higher layers (near output). However, these consensuses were basically drawn from the experiments based on natural images, which limits their scope of application. This work aims to study and complement them from a broader perspective by proposing a method to measure the transferability of pre-trained DNN parameters. Our experiments on twelve diverse image classification datasets get similar conclusions to the previous consensuses. More importantly, two new findings are presented, i.e., (1) in addition to the domain gap, a larger data amount and huge dataset diversity of downstream target task also prohibit the transferability; (2) although the lower layers learn basic image features, they are usually not the most transferable layers due to their domain sensitivity.

【5】 Why Are Conditional Generative Models Better Than Unconditional Ones?
标题：为什么条件生成模型比无条件生成模型更好？
链接：https://arxiv.org/abs/2212.00362

作者：Fan Bao,Chongxuan Li,Jiacheng Sun,Jun Zhu
机构：Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab,Tsinghua University, Beijing, China, Gaoling School of Artificial Intelligence, Renmin University of China
摘要：大量的经验证据表明，条件生成模型通过利用数据的标签，比无条件生成模型更容易训练，性能更好。基于分数的扩散模型也是如此。本文对这一现象进行了形式化分析，指出条件学习的关键是对数据进行适当的划分。在此基础上，提出了自条件扩散模型（self-conditioned diffusion models，SCDM），该模型是在预先训练好的模型中提取的特征上，利用k-means算法对指标进行聚类得到的。SCDM显著改进了跨各种数据集的无条件模型，并在ImageNet 64 x64（无标签）上实现了破纪录的3.94 FID。此外，在CIFAR 10上，SCDM比相应的条件模型获得了稍好的FID。
摘要：Extensive empirical evidence demonstrates that conditional generative models are easier to train and perform better than unconditional ones by exploiting the labels of data. So do score-based diffusion models. In this paper, we analyze the phenomenon formally and identify that the key of conditional learning is to partition the data properly. Inspired by the analyses, we propose self-conditioned diffusion models (SCDM), which is trained conditioned on indices clustered by the k-means algorithm on the features extracted by a model pre-trained in a self-supervised manner. SCDM significantly improves the unconditional model across various datasets and achieves a record-breaking FID of 3.94 on ImageNet 64x64 without labels. Besides, SCDM achieves a slightly better FID than the corresponding conditional model on CIFAR10.

【6】 Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma
标题：基于Lovász局部引理采样的马尔可夫随机场学习组合结构
链接：https://arxiv.org/abs/2212.00296

作者：Nan Jiang,Yi Gu,Yexiang Xue
机构： Department of Computer Science, Purdue University, USA, Department of Mathematics, Northwestern University, USA
备注：accepted by AAAI 2023
摘要：None
摘要：Generative models for learning combinatorial structures have transformative impacts in many applications. However, existing approaches fail to offer efficient and accurate learning results. Because of the highly intractable nature of the gradient estimation of the learning objective subject to combinatorial constraints. Existing gradient estimation methods would easily run into exponential time/memory space, or incur huge estimation errors due to improper approximation. We develop NEural Lovasz Sampler (Nelson), a neural network based on Lov\'asz Local Lemma (LLL). We show it guarantees to generate samples satisfying combinatorial constraints from the distribution of the constrained Markov Random Fields model (MRF) under certain conditions. We further present a fully differentiable contrastive-divergence-based learning framework on constrained MRF (Nelson-CD). Meanwhile, Nelson-CD being fully differentiable allows us to take advantage of the parallel computing power of GPUs, resulting in great efficiency. Experimental results on three real-world combinatorial problems reveal that Nelson learns to generate 100% valid structures. In comparison, baselines either time out on large-size data sets or fail to generate valid structures, whereas Nelson scales much better with problem size. In addition, Nelson outperforms baselines in various learning metrics, such as log-likelihood and MAP scores.

【7】 The Effect of Data Dimensionality on Neural Network Prunability
标题：数据维度对神经网络可剪缩性的影响
链接：https://arxiv.org/abs/2212.00291

作者：Zachary Ankner,Alex Renda,Gintare Karolina Dziugaite,Jonathan Frankle,Tian Jin
机构：Massachusetts Institute of Technology, MosaicML, Google Research, Brain Team
摘要：实践者修剪神经网络是为了提高效率和泛化能力，但很少仔细检查决定神经网络可修剪性的因素，即修剪在不损害模型测试精度的情况下可以去除的权重的最大部分。在这项工作中，我们研究了输入数据的性质，可能有助于神经网络的修剪。对于诸如图像、文本和音频的高维输入数据，流形假设暗示这些高维输入近似地位于显著较低维度的流形上或附近。现有工作表明，输入数据的潜在低维结构可能影响学习的样本效率。本文研究了输入数据的低维结构是否影响神经网络的可修剪性。
摘要：Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy. In this work, we study the properties of input data that may contribute to the prunability of a neural network. For high dimensional input data such as images, text, and audio, the manifold hypothesis suggests that these high dimensional inputs approximately lie on or near a significantly lower dimensional manifold. Prior work demonstrates that the underlying low dimensional structure of the input data may affect the sample efficiency of learning. In this paper, we investigate whether the low dimensional structure of the input data affects the prunability of a neural network.

【8】 Task Discovery: Finding the Tasks that Neural Networks Generalize on
标题：任务发现：寻找神经网络泛化的任务
链接：https://arxiv.org/abs/2212.00261

作者：Andrei Atanov,Andrei Filatov,Teresa Yeo,Ajay Sohmshetty,Amir Zamir
机构：Swiss Federal Institute of Technology (EPFL)
备注：NeurIPS 2022, Project page at this https URL
摘要：在开发深度学习模型时，我们通常会先确定要解决的任务，然后寻找一个能够很好地概括该任务的模型。一个有趣的问题是：如果我们修复模型并在任务空间中搜索，而不是修复任务并在模型空间中搜索，会怎样？我们能找到模型所概括的任务吗？它们看起来怎么样，或者它们能说明什么吗？这些就是我们在本文中要解决的问题。我们提出一个任务发现框架，通过优化一个基于泛化的量，称为一致性得分，自动发现这样的任务的例子。我们证明了一组图像可以产生许多任务，神经网络在这些任务上具有很好的泛化能力。这些任务反映了学习框架的归纳偏差和数据中存在的统计模式，因此它们可以成为分析神经网络及其偏差的有用工具。作为一个例子，我们展示了发现的任务可以用来自动创建对抗性的训练-测试分裂，这使得模型在测试时失败，而不改变像素或标签，而仅仅通过选择数据点应该如何在训练集和测试集之间分裂。最后，我们讨论了所发现任务的人类可解释性。
摘要：When developing deep learning models, we usually decide what task we want to solve then search for a model that generalizes well on the task. An intriguing question would be: what if, instead of fixing the task and searching in the model space, we fix the model and search in the task space? Can we find tasks that the model generalizes on? How do they look, or do they indicate anything? These are the questions we address in this paper. We propose a task discovery framework that automatically finds examples of such tasks via optimizing a generalization-based quantity called agreement score. We demonstrate that one set of images can give rise to many tasks on which neural networks generalize well. These tasks are a reflection of the inductive biases of the learning framework and the statistical patterns present in the data, thus they can make a useful tool for analysing the neural networks and their biases. As an example, we show that the discovered tasks can be used to automatically create adversarial train-test splits which make a model fail at test time, without changing the pixels or labels, but by only selecting how the datapoints should be split between the train and test sets. We end with a discussion on human-interpretability of the discovered tasks.

【9】 Gated Recurrent Neural Networks with Weighted Time-Delay Feedback
标题：具有加权时滞反馈的门控递归神经网络
链接：https://arxiv.org/abs/2212.00228

作者：N. Benjamin Erichson,Soon Hoe Lim,Michael W. Mahoney
机构：University of Pittsburgh, and ICSI, Nordita, KTH Royal Institute of Technology, and Stockholm University, ICSI, LBNL, and UC Berkeley
摘要：为了改进时序数据中长期依赖关系的建模，引入了一种新的带加权时延反馈机制的门控递归单元（GRU）.该模型是递归单元的连续时间公式的离散化版本，其中动力学由延迟微分方程（DDE）控制。通过考虑合适的时间离散化方案，我们提出了$\tau$-GRU，一种具有延迟的离散时间门控递归单元。我们证明了连续时间模型解的存在性和唯一性，并且我们证明了所提出的反馈机制有助于改进长期依赖的建模.我们的经验结果表明，$\tau$-GRU在包括时间序列分类、人类活动识别和语音识别在内的一系列任务上比现有技术的递归单元和门控递归架构能够更快地收敛并且更好地推广。
摘要：We introduce a novel gated recurrent unit (GRU) with a weighted time-delay feedback mechanism in order to improve the modeling of long-term dependencies in sequential data. This model is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). By considering a suitable time-discretization scheme, we propose $\tau$-GRU, a discrete-time gated recurrent unit with delay. We prove the existence and uniqueness of solutions for the continuous-time model, and we demonstrate that the proposed feedback mechanism can help improve the modeling of long-term dependencies. Our empirical results show that $\tau$-GRU can converge faster and generalize better than state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, including time-series classification, human activity recognition, and speech recognition.

【10】 Experimental Observations of the Topology of Convolutional Neural Network Activations
标题：卷积神经网络激活拓扑结构的实验观察
链接：https://arxiv.org/abs/2212.00222

作者：Emilie Purvine,Davis Brown,Brett Jefferson,Cliff Joslyn,Brenda Praggastis,Archit Rathore,Madelyn Shapiro,Bei Wang,Youjia Zhou
机构： Pacific Northwest National Laboratory, Scientific Computing and Imaging (SCI) Institute and School of Computing, University of Utah
备注：Accepted at AAAI 2023. This version includes supplementary material
摘要：拓扑数据分析（TDA）是计算数学的一个分支，是代数拓扑学和数据科学的桥梁，它提供了复杂结构的紧凑、噪声鲁棒的表示。深度神经网络（DNN）学习与模型架构定义的一系列变换相关的数百万个参数，从而导致输入数据的高维、难以解释的内部表示。随着DNN在我们社会的多个部门变得越来越普遍，人们越来越认识到需要数学方法来帮助分析师、研究人员和从业人员理解和解释这些模型的内部表示与最终分类的关系。在本文中，我们应用了TDA的前沿技术，目的是深入了解用于图像分类的卷积神经网络的可解释性。我们使用两种常见的TDA方法来探索几种将隐藏层激活建模为高维点云的方法，并提供实验证据，证明这些点云捕获了关于模型过程的有价值的结构信息。首先，我们证明了基于持续同源性的距离度量可以用于量化层之间有意义的差异，并且我们在现有的用于神经网络可解释性的表示相似性度量的更广泛的背景下讨论这些距离。其次，我们展示了映射图可以提供这些模型如何在每一层组织层次类知识的语义洞察。这些观察结果表明，TDA是帮助深度学习从业者解锁其模型隐藏结构的有用工具。
摘要：Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden-layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers, and we discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight into how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models.

【11】 Multi-Task Imitation Learning for Linear Dynamical Systems
标题：线性动态系统的多任务仿真学习
链接：https://arxiv.org/abs/2212.00186

作者：Thomas T. Zhang,Katie Kang,Bruce D. Lee,Claire Tomlin,Sergey Levine,Stephen Tu,Nikolai Matni
机构：University of Pennsylvania, University of California, Berkeley, Google Research, Brain Team
摘要：None
摘要：We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.

【12】 Optical multi-task learning using multi-wavelength diffractive deep neural networks
标题：基于多波长衍射深度神经网络的光学多任务学习
链接：https://arxiv.org/abs/2212.00022

作者：Zhengyang Duan,Hang Chen,Xing Lin
机构：Optical multi-task learning using multi-wavelength diffractive, deep neural networks, Department of Electronic Engineering, Tsinghua University, Beijing , China
摘要：光子神经网络是一种大脑启发的信息处理技术，使用光子代替电子来执行人工智能（AI）任务。然而，现有的体系结构都是针对单个任务设计的，由于任务间的竞争而导致模型性能下降，无法在单个单片系统中并行地复用不同的任务。提出了一种新的光学多任务学习系统，采用联合优化方法设计多波长衍射深度神经网络（D2NNs）。通过将多任务输入编码到多个波长通道中，该系统可以提高计算吞吐量，并显著缓解高精度并行执行多任务的竞争。我们分别设计了两个和四个光谱通道的两任务和四任务D2NN，用于对来自MNIST、FMNIST、KMNIST和EMNIST数据库的不同输入进行分类。数值仿真结果表明，在网络规模相同的情况下，多波长D2NN的多任务学习分类精度明显高于单波长D2NN.此外，通过增加网络大小，用于同时执行多个任务的多波长D2NN实现了与单独训练多个单波长D2NN以分别执行任务相当的分类精度。我们的工作为开发波分复用技术以实现高吞吐量的神经形态光子计算和更通用的人工智能系统以并行执行多个任务铺平了道路。
摘要：Photonic neural networks are brain-inspired information processing technology using photons instead of electrons to perform artificial intelligence (AI) tasks. However, existing architectures are designed for a single task but fail to multiplex different tasks in parallel within a single monolithic system due to the task competition that deteriorates the model performance. This paper proposes a novel optical multi-task learning system by designing multi-wavelength diffractive deep neural networks (D2NNs) with the joint optimization method. By encoding multi-task inputs into multi-wavelength channels, the system can increase the computing throughput and significantly alle-viate the competition to perform multiple tasks in parallel with high accuracy. We design the two-task and four-task D2NNs with two and four spectral channels, respectively, for classifying different inputs from MNIST, FMNIST, KMNIST, and EMNIST databases. The numerical evaluations demonstrate that, under the same network size, mul-ti-wavelength D2NNs achieve significantly higher classification accuracies for multi-task learning than single-wavelength D2NNs. Furthermore, by increasing the network size, the multi-wavelength D2NNs for simultaneously performing multiple tasks achieve comparable classification accuracies with respect to the individual training of multiple single-wavelength D2NNs to perform tasks separately. Our work paves the way for developing the wave-length-division multiplexing technology to achieve high-throughput neuromorphic photonic computing and more general AI systems to perform multiple tasks in parallel.

【13】 Knowledge-augmented Deep Learning and Its Applications: A Survey
标题：知识增强型深度学习及其应用综述
链接：https://arxiv.org/abs/2212.00017

作者：Zijun Cui,Tian Gao,Kartik Talamadupula,Qiang Ji
机构：©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
备注：Submitted to IEEE Transactions on Neural Networks and Learning Systems
摘要：深度学习模型虽然在过去几年中在许多不同领域取得了巨大成功，但通常需要大量数据，无法在未见过的样本上表现良好，并且缺乏可解释性。在目标领域中经常存在各种先验知识，它们的使用可以缓解深度学习的不足。为了更好地模拟人类大脑的行为，已经提出了不同的高级方法来识别领域知识并将其集成到深度模型中，以实现数据高效、可推广和可解释的深度学习，我们称之为知识增强深度学习（KADL）。本文对KADL的概念进行了界定，介绍了KADL的三大任务，即：知识识别、知识表示和知识集成。与现有的针对特定知识类型的调查不同，我们提供了一个广泛而完整的领域知识分类及其表示。基于我们的分类法，我们提供了现有技术的系统综述，不同于现有的工作，调查集成方法不可知的知识分类法。本调查涵盖了现有的工作，并提供了知识增强深度学习一般领域的研究鸟瞰图。通过对大量文献的深入分析，不仅可以了解知识增强深度学习的研究现状，而且可以为知识增强深度学习的研究指明未来的发展方向。
摘要：Deep learning models, though having achieved great success in many different fields over the past years, are usually data hungry, fail to perform well on unseen samples, and lack of interpretability. Various prior knowledge often exists in the target domain and their use can alleviate the deficiencies with deep learning. To better mimic the behavior of human brains, different advanced methods have been proposed to identify domain knowledge and integrate it into deep models for data-efficient, generalizable, and interpretable deep learning, which we refer to as knowledge-augmented deep learning (KADL). In this survey, we define the concept of KADL, and introduce its three major tasks, i.e., knowledge identification, knowledge representation, and knowledge integration. Different from existing surveys that are focused on a specific type of knowledge, we provide a broad and complete taxonomy of domain knowledge and its representations. Based on our taxonomy, we provide a systematic review of existing techniques, different from existing works that survey integration approaches agnostic to taxonomy of knowledge. This survey subsumes existing works and offers a bird's-eye view of research in the general area of knowledge-augmented deep learning. The thorough and critical reviews of numerous papers help not only understand current progresses but also identify future directions for the research on knowledge-augmented deep learning.

【14】 A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing
标题：一种轻量级高效的众包标签聚合模型
链接：https://arxiv.org/abs/2212.00007

作者：Yi Yang,Zhong-Qiu Zhao,Quan Bai,Qing Liu,Weihua Li
机构：Hefei University of Technology, Hefei, China, University of Tasmania, Hobart, Australia, Data, CSIRO, Auckland University of Technology, Auckland, New Zealand
摘要：由于众包标签中存在噪声，标签聚合成为众包标签后处理的标准流程. LA方法通过对工人素质建模来估计众包标签中的真实标签。大多数现有的LA方法本质上是迭代的。他们需要多次遍历所有众包标签，以便联合迭代更新真实标签和工人素质，直到收敛。因此，这些方法具有高的空间和时间复杂度。本文将学习算法看作一个动态系统，并将其建模为动态贝叶斯网络。从动态模型出发，我们推导出了两个轻量级算法LA\textsuperscript{onepass}和LA\textsuperscript{twopass}，它们通过最多两次遍历所有标签就可以有效地估计工人的素质和真实标签。由于该算法具有动态性，因此无需重新访问历史数据即可在线估计真实标签.从理论上证明了算法的收敛性，并给出了估计的误差界.我们还分析了所提算法的空间和时间复杂度，并证明了它们与多数表决算法的空间和时间复杂度是等价的。在20个真实数据集上进行的实验表明，所提算法在离线和在线环境下都能有效地聚合标签，即使它们最多遍历所有标签两次.
摘要：Due to the noises in crowdsourced labels, label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels. LA methods estimate true labels from crowdsourced labels by modeling worker qualities. Most existing LA methods are iterative in nature. They need to traverse all the crowdsourced labels multiple times in order to jointly and iteratively update true labels and worker qualities until convergence. Consequently, these methods have high space and time complexities. In this paper, we treat LA as a dynamic system and model it as a Dynamic Bayesian network. From the dynamic model we derive two light-weight algorithms, LA\textsuperscript{onepass} and LA\textsuperscript{twopass}, which can effectively and efficiently estimate worker qualities and true labels by traversing all the labels at most twice. Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data. We theoretically prove the convergence property of the proposed algorithms, and bound the error of estimated worker qualities. We also analyze the space and time complexities of the proposed algorithms and show that they are equivalent to those of majority voting. Experiments conducted on 20 real-world datasets demonstrate that the proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings even if they traverse all the labels at most twice.

【15】 Using Gradient to Boost the Generalization Performance of Deep Learning Models for Fluid Dynamics
标题：利用梯度提高流体力学深度学习模型的泛化性能
链接：https://arxiv.org/abs/2212.00716

作者：Eduardo Vital Brasil
机构：Using Gradient to Boost Generalization Performance of, Deep Learning Models for Fluid Dynamics, Supervisors, Youssef Mesri, Mines ParisTech, ProfessorDirector Post Master HPC-AI, Thibaut Munzer, Extrality, Machine Learning team lead, MINES ParisTech | PSL University
备注：Master's thesis at Mines Paristech, HPC-AI program
摘要：当今，计算流体力学（CFD）是工业设计的基本工具。然而，进行这种模拟的计算成本是昂贵的，并且对于其中许多模拟是必要的真实世界使用情况是有害的，例如形状优化的任务。最近，深度学习（DL）在广泛的应用领域实现了重大飞跃，成为物理系统的良好候选对象，为CFD开辟了前景。为了避开计算流体力学的计算瓶颈，DL模型已被用于学习欧几里得数据，最近还用于学习非欧几里得数据，如非结构化网格和流形，从而实现更快、更有效的（内存、硬件）替代模型。然而，DL存在从训练数据分布（设计空间）外推（归纳）的内在限制。在本研究中，我们提出了一个新的工作来提高深度学习的泛化能力。为此，我们将物理梯度（输出的导数w.r.t.输入）到DL模型。我们的策略已经显示出对DL网络的更好推广的良好结果，并且我们的方法/理论研究通过经验验证得到证实，包括消融研究。
摘要：Nowadays, Computational Fluid Dynamics (CFD) is a fundamental tool for industrial design. However, the computational cost of doing such simulations is expensive and can be detrimental for real-world use cases where many simulations are necessary, such as the task of shape optimization. Recently, Deep Learning (DL) has achieved a significant leap in a wide spectrum of applications and became a good candidate for physical systems, opening perspectives to CFD. To circumvent the computational bottleneck of CFD, DL models have been used to learn on Euclidean data, and more recently, on non-Euclidean data such as unstuctured grids and manifolds, allowing much faster and more efficient (memory, hardware) surrogate models. Nevertheless, DL presents the intrinsic limitation of extrapolating (generalizing) out of training data distribution (design space). In this study, we present a novel work to increase the generalization capabilities of Deep Learning. To do so, we incorporate the physical gradients (derivatives of the outputs w.r.t. the inputs) to the DL models. Our strategy has shown good results towards a better generalization of DL networks and our methodological/ theoretical study is corroborated with empirical validation, including an ablation study.

【16】 Quantum Neural Networks for a Supply Chain Logistics Application
标题：量子神经网络在供应链物流中的应用
链接：https://arxiv.org/abs/2212.00576

作者：Randall Correll,Sean J. Weinberg,Fabio Sanches,Takanori Ide,Takafumi Suzuki
机构：QC Ware Corp., Palo Alto, CA USA, AISIN CORPORATION, Tokyo Research Center, Chiyoda-ku, Tokyo, Japan, Aisin Technical Center of America, San Jose, CA USA, )
备注：14 pages, 11 figures. arXiv admin note: text overlap with arXiv:2211.17078
摘要：None
摘要：Problem instances of a size suitable for practical applications are not likely to be addressed during the noisy intermediate-scale quantum (NISQ) period with (almost) pure quantum algorithms. Hybrid classical-quantum algorithms have potential, however, to achieve good performance on much larger problem instances. We investigate one such hybrid algorithm on a problem of substantial importance: vehicle routing for supply chain logistics with multiple trucks and complex demand structure. We use reinforcement learning with neural networks with embedded quantum circuits. In such neural networks, projecting high-dimensional feature vectors down to smaller vectors is necessary to accommodate restrictions on the number of qubits of NISQ hardware. However, we use a multi-head attention mechanism where, even in classical machine learning, such projections are natural and desirable. We consider data from the truck routing logistics of a company in the automotive sector, and apply our methodology by decomposing into small teams of trucks, and we find results comparable to human truck assignment.

【17】 Enabling Fast Unit Commitment Constraint Screening via Learning Cost Model
标题：通过学习成本模型实现快速机组组合约束筛选
链接：https://arxiv.org/abs/2212.00483

作者：Xuan He,Honglin Wen,Yufan Zhang,Yize Chen
机构：∗Hong Kong University of Science and Technology (Guangzhou), †Shanghai Jiao Tong University
摘要：None
摘要：Unit commitment (UC) are essential tools to transmission system operators for finding the most economical and feasible generation schedules and dispatch signals. Constraint screening has been receiving attention as it holds the promise for reducing a number of inactive or redundant constraints in the UC problem, so that the solution process of large scale UC problem can be accelerated by considering the reduced optimization problem. Standard constraint screening approach relies on optimizing over load and generations to find binding line flow constraints, yet the screening is conservative with a large percentage of constraints still reserved for the UC problem. In this paper, we propose a novel machine learning (ML) model to predict the most economical costs given load inputs. Such ML model bridges the cost perspectives of UC decisions to the optimization-based constraint screening model, and can screen out higher proportion of operational constraints. We verify the proposed method's performance on both sample-aware and sample-agnostic setting, and illustrate the proposed scheme can further reduce the computation time on a variety of setup for UC problems.

【18】 On the Compatibility between a Neural Network and a Partial Differential Equation for Physics-informed Learning
标题：物理知情学习中神经网络与偏微分方程的兼容性研究
链接：https://arxiv.org/abs/2212.00270

作者：Kuangdai Leng,Jeyan Thiyagalingam
机构：Scientific Computing Department, Science and Technology Facilities Council, Rutherford Appleton Laboratory, Didcot, OX,QX, UK
备注：9 pages, 3 figures
摘要：我们揭示了物理信息神经网络（PINN）中的陷阱和机会。证明了多层感知器（MLP）只具有ReLU（Rectified Linear Unit）或类ReLU的Lipschitz激活函数时总是导致Hessian消失.这种网络施加的约束与任何二阶或更高阶偏微分方程（PDE）相矛盾。因此，基于ReLU的MLP不能形成一个允许的函数空间来逼近它们的解。受此启发，我们证明了当一个具有$C^n$个激活函数的MLP的输出层权值位于某个超平面（称为外层超平面）上时，一个$n$阶的线性偏微分方程可以严格满足.配备了外层超平面的MLP变成“物理强制的”，不再需要PDE本身的损失函数（而只需要初始和边界条件的损失函数）。这样的超平面不仅存在于MLP，而且存在于任何由全连接隐藏层尾随的网络体系结构。据我们所知，这应该是第一个强制PDE的逐点正确性的PINN架构。给出了二阶线性偏微分方程外层超平面的封闭形式表达式，并给出了一个实现.
摘要：We shed light on a pitfall and an opportunity in physics-informed neural networks (PINNs). We prove that a multilayer perceptron (MLP) only with ReLU (Rectified Linear Unit) or ReLU-like Lipschitz activation functions will always lead to a vanished Hessian. Such a network-imposed constraint contradicts any second- or higher-order partial differential equations (PDEs). Therefore, a ReLU-based MLP cannot form a permissible function space for the approximation of their solutions. Inspired by this pitfall, we prove that a linear PDE up to the $n$-th order can be strictly satisfied by an MLP with $C^n$ activation functions when the weights of its output layer lie on a certain hyperplane, as called the out-layer-hyperplane. An MLP equipped with the out-layer-hyperplane becomes "physics-enforced", no longer requiring a loss function for the PDE itself (but only those for the initial and boundary conditions). Such a hyperplane exists not only for MLPs but for any network architecture tailed by a fully-connected hidden layer. To our knowledge, this should be the first PINN architecture that enforces point-wise correctness of a PDE. We give the closed-form expression of the out-layer-hyperplane for second-order linear PDEs and provide an implementation.

【19】 DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
标题：Del-Dock：DNA编码库的分子对接建模
链接：https://arxiv.org/abs/2212.00136

作者：Kirill Shmilovich,Benson Chen,Theofanis Karaletos,Mohammad M. Sultan
机构：Sultan∗,‡, †Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois , ‡Insitro, South San Francisco, California , United States, ¶Equal Contribution, arXiv:,.,v, [q-bio.QM] , Nov
摘要：DNA编码文库（DEL）技术通过实现组合产生的分子文库的有效测试，使得命中鉴定取得了显著进展。DEL筛选通过对标记有独特DNA条形码的分子的测序读数来测量蛋白质结合亲和力，所述DNA条形码在一系列选择实验中存活。已经部署计算模型来学习与测序计数数据相关的潜在结合亲和力;然而，这种相关性经常被在其复杂的数据生成过程中引入的各种噪声源混淆。为了对DEL计数数据进行去噪并筛选具有良好结合亲和力的分子，计算模型需要在其建模结构中进行正确的假设，以捕获数据背后的正确信号。DEL模型的最新进展集中于计数数据的概率公式，但现有方法迄今为止仅限于利用2-D分子水平表示。我们引入了一个新的范例，DEL-Dock，它结合了基于配体的描述符和来自对接的蛋白质-配体复合物的三维空间信息。3-D空间信息允许我们的模型学习实际的结合形式，而不是仅使用配体的基于结构的信息。我们表明，我们的模型能够有效地对DEL计数数据进行去噪，以预测分子富集分数，与先前的工作相比，这些分数与实验结合亲和力测量结果更好地相关。此外，通过对一组对接姿态的学习，我们证明了仅在DEL数据上训练的我们的模型隐式地学习执行良好的对接姿态选择，而不需要来自昂贵来源的蛋白质晶体结构的外部监督。
摘要：DNA-Encoded Library (DEL) technology has enabled significant advances in hit identification by enabling efficient testing of combinatorially-generated molecular libraries. DEL screens measure protein binding affinity though sequencing reads of molecules tagged with unique DNA-barcodes that survive a series of selection experiments. Computational models have been deployed to learn the latent binding affinities that are correlated to the sequenced count data; however, this correlation is often obfuscated by various sources of noise introduced in its complicated data-generation process. In order to denoise DEL count data and screen for molecules with good binding affinity, computational models require the correct assumptions in their modeling structure to capture the correct signals underlying the data. Recent advances in DEL models have focused on probabilistic formulations of count data, but existing approaches have thus far been limited to only utilizing 2-D molecule-level representations. We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes. 3-D spatial information allows our model to learn over the actual binding modality rather than using only structured-based information of the ligand. We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores that are better correlated with experimental binding affinity measurements compared to prior works. Moreover, by learning over a collection of docked poses we demonstrate that our model, trained only on DEL data, implicitly learns to perform good docking pose selection without requiring external supervision from expensive-to-source protein crystal structures.

其他(32篇)

【1】 Fully-Dynamic Decision Trees
标题：全动态决策树
链接：https://arxiv.org/abs/2212.00778

作者：Marco Bressan,Gabriel Damay,Mauro Sozio
机构：Department of Computer Science, University of Milan, Institut Polytechnique de Paris, T´el´ecom Paris
摘要：我们开发了第一个完全动态的算法，该算法在任意序列的插入和删除标记的例子上维护一个决策树。给定$\epsilon〉0$，我们的算法保证，在每个时间点，决策树的每个节点都使用一个分裂，其基尼增益在最优值的加性$\epsilon$内。对于实值特征，该算法具有$O\big（\frac{d\log^3 n}{\epsilon^2}\big）$的每次插入/删除的分摊运行时间，这对于二进制或分类特征改进到$O\big（\frac{d\log^2 n}{\epsilon}\big）$，同时它使用空间$O（n d）$，其中$n$是在任何时间点的最大示例数目，并且$d$是特征数目。我们的算法几乎是最优的，因为我们表明任何具有类似保证的算法都使用分摊的运行时间$\Omega（d）$和空间$\tilde{\Omega}（n d）$。我们通过在真实数据上的大量实验评估来补充我们的理论结果，显示了我们算法的有效性。
摘要：We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given $\epsilon > 0$ our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive $\epsilon$ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of $O\big(\frac{d \log^3 n}{\epsilon^2}\big)$, which improves to $O\big(\frac{d \log^2 n}{\epsilon}\big)$ for binary or categorical features, while it uses space $O(n d)$, where $n$ is the maximum number of examples at any point in time and $d$ is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time $\Omega(d)$ and space $\tilde{\Omega} (n d)$. We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

【2】 Exploiting Socially-Aware Tasks for Embodied Social Navigation
标题：利用体验式社交导航的社交感知任务
链接：https://arxiv.org/abs/2212.00767

作者：Enrico Cancelli,Tommaso Campari,Luciano Serafini,Angel X. Chang,Lamberto Ballan
机构： University of Padova, Fondazione Bruno Kessler (FBK), Simon Fraser University
摘要：学习如何在封闭和空间受限的室内环境中在人类之间导航，是具身智能体融入社会所需的关键能力。本文提出了一种端到端的结构，利用社会感知任务（称为风险和社会指南针）将推断常识性社会行为的能力注入到强化学习导航策略中。为此，我们的任务利用了碰撞的当前和未来危险的概念。此外，我们还提出了一个专门针对模拟环境中的社会导航任务而设计的评估协议。通过分析人机空间交互的最小单元--相遇，捕捉策略的细粒度特征和特性。我们在Gibson 4+和Habitat-Matterport 3D数据集上验证了我们的方法。
摘要：Learning how to navigate among humans in an occluded and spatially constrained indoor environment, is a key ability required to embodied agent to be integrated into our society. In this paper, we propose an end-to-end architecture that exploits Socially-Aware Tasks (referred as to Risk and Social Compass) to inject into a reinforcement learning navigation policy the ability to infer common-sense social behaviors. To this end, our tasks exploit the notion of immediate and future dangers of collision. Furthermore, we propose an evaluation protocol specifically designed for the Social Navigation Task in simulated environments. This is done to capture fine-grained features and characteristics of the policy by analyzing the minimal unit of human-robot spatial interaction, called Encounter. We validate our approach on Gibson4+ and Habitat-Matterport3D datasets.

【3】 P(Expression|Grammar): Probability of deriving an algebraic expression with a probabilistic context-free grammar
标题：P(表达式|语法)：使用概率上下文无关文法推导代数表达式的概率
链接：https://arxiv.org/abs/2212.00751

作者：Urh Primozič,Ljupčo Todorovski,Matej Petković
机构：University of Ljubljana, Jadranska, Ljubljana, SI-, Slovenia, Jožef Stefan Institute, Department of Knowledge Technologies, Jamova
摘要：概率上下文无关文法在机器学习和符号回归中作为生成模型有着长期的使用记录。当用于符号回归时，它们生成代数表达式。我们将后者定义为由文法导出的字符串的等价类，并解决了用给定文法导出给定表达式的概率的计算问题。我们证明了该问题在一般情况下是不可判定的。然后，我们给出了生成线性、多项式和有理表达式的具体语法，其中存在计算给定表达式的概率的算法。对于这些文法，我们设计了计算精确概率和任意精度的有效近似的算法。
摘要：Probabilistic context-free grammars have a long-term record of use as generative models in machine learning and symbolic regression. When used for symbolic regression, they generate algebraic expressions. We define the latter as equivalence classes of strings derived by grammar and address the problem of calculating the probability of deriving a given expression with a given grammar. We show that the problem is undecidable in general. We then present specific grammars for generating linear, polynomial, and rational expressions, where algorithms for calculating the probability of a given expression exist. For those grammars, we design algorithms for calculating the exact probability and efficient approximation with arbitrary precision.

【4】 Exploiting Kernel Compression on BNNs
标题：基于BNN的核压缩技术研究
链接：https://arxiv.org/abs/2212.00608

作者：Franyell Silfa,Jose Maria Arnau,Antonio González
机构：Department of Computer Architecture, Universitat Politecnica de Catalunya, Barcelona, Spain, Antonio Gonz´alez
摘要：None
摘要：Binary Neural Networks (BNNs) are showing tremendous success on realistic image classification tasks. Notably, their accuracy is similar to the state-of-the-art accuracy obtained by full-precision models tailored to edge devices. In this regard, BNNs are very amenable to edge devices since they employ 1-bit to store the inputs and weights, and thus, their storage requirements are low. Also, BNNs computations are mainly done using xnor and pop-counts operations which are implemented very efficiently using simple hardware structures. Nonetheless, supporting BNNs efficiently on mobile CPUs is far from trivial since their benefits are hindered by frequent memory accesses to load weights and inputs. In BNNs, a weight or an input is stored using one bit, and aiming to increase storage and computation efficiency, several of them are packed together as a sequence of bits. In this work, we observe that the number of unique sequences representing a set of weights is typically low. Also, we have seen that during the evaluation of a BNN layer, a small group of unique sequences is employed more frequently than others. Accordingly, we propose exploiting this observation by using Huffman Encoding to encode the bit sequences and then using an indirection table to decode them during the BNN evaluation. Also, we propose a clustering scheme to identify the most common sequences of bits and replace the less common ones with some similar common sequences. Hence, we decrease the storage requirements and memory accesses since common sequences are encoded with fewer bits. We extend a mobile CPU by adding a small hardware structure that can efficiently cache and decode the compressed sequence of bits. We evaluate our scheme using the ReAacNet model with the Imagenet dataset. Our experimental results show that our technique can reduce memory requirement by 1.32x and improve performance by 1.35x.

【5】 When is Cognitive Radar Beneficial?
标题：认知雷达何时才是有益的？
链接：https://arxiv.org/abs/2212.00597

作者：Charles E. Thornton,R. Michael Buehrer
机构： Bradley Department ofECE
备注：6 pages, 5 figures
摘要：基于在线强化学习的频率捷变认知雷达何时应优于基于规则的自适应波形选择策略？我们通过研究动态频谱接入场景来寻求关于这个问题的见解，在该场景中，雷达希望在每个脉冲重复间隔期间以最宽的未占用带宽进行发射。在线学习被比作一个固定的基于规则的感知和避免策略。我们表明，给定一个简单的马尔可夫信道模型，该问题可以通过随机优势分析简单的情况。此外，我们还表明，对于更现实的信道假设，基于学习的方法表现出更强的泛化能力。然而，对于指定时间范围较短的问题，我们发现机器学习方法可能由于收敛时间的固有限制而表现不佳。我们得出结论，何时基于学习的方法是有益的，并为未来的研究提供指导方针。
摘要：When should an online reinforcement learning-based frequency agile cognitive radar be expected to outperform a rule-based adaptive waveform selection strategy? We seek insight regarding this question by examining a dynamic spectrum access scenario, in which the radar wishes to transmit in the widest unoccupied bandwidth during each pulse repetition interval. Online learning is compared to a fixed rule-based sense-and-avoid strategy. We show that given a simple Markov channel model, the problem can be examined analytically for simple cases via stochastic dominance. Additionally, we show that for more realistic channel assumptions, learning-based approaches demonstrate greater ability to generalize. However, for short time-horizon problems that are well-specified, we find that machine learning approaches may perform poorly due to the inherent limitation of convergence time. We draw conclusions as to when learning-based approaches are expected to be beneficial and provide guidelines for future study.

【6】 Privacy-Preserving Data Synthetisation for Secure Information Sharing
标题：用于安全信息共享的隐私保护数据合成
链接：https://arxiv.org/abs/2212.00484

作者：Tânia Carvalho,Nuno Moniz,Pedro Faria,Luís Antunes,Nitesh Chawla
机构：Department of Computer Science, University of Porto, Rua do Campo, Lucy Family Institute for Data and Society, University of Notre Dame, Indiana, USA, INESC TEC, Rua Dr. Roberto Frias, Porto, Portugal, TekPrivacy, Rua do Campo Alegre, Porto, Portugal
备注：10 pages, 7 figures and 3 tables
摘要：我们可以通过许多方法来保护用户数据隐私，例如统计变换或生成模型。然而，它们中的每一个都有严重的缺点。一方面，使用传统技术创建变换的数据集是非常耗时的。另一方面，除了较长的训练阶段之外，最近的基于深度学习的解决方案需要大量的计算资源。本文提出了PrivateSMOTE算法，该算法在保护案例时具有竞争力，能够最大限度地降低重新识别的风险，同时需要更少的时间和计算资源。它通过插值生成合成数据来模糊高风险病例，同时最大限度地减少原始数据的数据效用损失。在20个数据集上与多种传统和最先进的隐私保护方法相比，PrivateSMOTE在重新识别风险方面表现出了竞争性结果。此外，它呈现出与基线相似或更高的预测性能，包括生成对抗网络和变分自动编码器，分别将其能耗和时间要求降低了9倍和12倍。
摘要：We can protect user data privacy via many approaches, such as statistical transformation or generative models. However, each of them has critical drawbacks. On the one hand, creating a transformed data set using conventional techniques is highly time-consuming. On the other hand, in addition to long training phases, recent deep learning-based solutions require significant computational resources. In this paper, we propose PrivateSMOTE, a technique designed for competitive effectiveness in protecting cases at maximum risk of re-identification while requiring much less time and computational resources. It works by synthetic data generation via interpolation to obfuscate high-risk cases while minimizing data utility loss of the original data. Compared to multiple conventional and state-of-the-art privacy-preservation methods on 20 data sets, PrivateSMOTE demonstrates competitive results in re-identification risk. Also, it presents similar or higher predictive performance than the baselines, including generative adversarial networks and variational autoencoders, reducing their energy consumption and time requirements by a minimum factor of 9 and 12, respectively.

【7】 Implicit Mixture of Interpretable Experts for Global and Local Interpretability
标题：全球和本地可解释专家的隐含混合
链接：https://arxiv.org/abs/2212.00471

作者：Nathan Elazar,Kerry Taylor
机构：Australian National University
摘要：我们研究了使用可解释专家混合（MoIE）在MNIST10上构建可解释图像分类器的可行性。MoIE使用黑盒路由器将每个输入分配给许多内在可解释专家中的一个，从而深入了解做出特定分类决定的原因。我们发现，一个天真的MoIE会学会“作弊”，借此黑盒路由器将解决自己的分类问题，与每个专家简单地学习一个常数函数为一个特定的类。我们建议通过引入可解释路由器和训练黑盒路由器的决策以匹配可解释路由器来解决这个问题。此外，我们提出了一种新的隐式参数化方案，允许我们构建任意数量专家的混合，允许我们研究分类性能、局部和全局可解释性如何随着专家数量的增加而变化。我们的新模型，称为可解释专家的隐式混合（IMoIE），可以在提供局部可解释性的同时匹配MNIST10上最先进的分类准确度，并可以提供全局可解释性，尽管代价是降低分类准确度。
摘要：We investigate the feasibility of using mixtures of interpretable experts (MoIE) to build interpretable image classifiers on MNIST10. MoIE uses a black-box router to assign each input to one of many inherently interpretable experts, thereby providing insight into why a particular classification decision was made. We find that a naively trained MoIE will learn to 'cheat', whereby the black-box router will solve the classification problem by itself, with each expert simply learning a constant function for one particular class. We propose to solve this problem by introducing interpretable routers and training the black-box router's decisions to match the interpretable router. In addition, we propose a novel implicit parameterization scheme that allows us to build mixtures of arbitrary numbers of experts, allowing us to study how classification performance, local and global interpretability vary as the number of experts is increased. Our new model, dubbed Implicit Mixture of Interpretable Experts (IMoIE) can match state-of-the-art classification accuracy on MNIST10 while providing local interpretability, and can provide global interpretability albeit at the cost of reduced classification accuracy.

【8】 Regularization with Fake Features
标题：带有伪特征的正则化
链接：https://arxiv.org/abs/2212.00433

作者：Martin Hellkvist,Ayça Özçelikkale,Anders Ahlén
机构： Department of Electrical Engineering, Uppsala University, Sweden
摘要：大规模过参数模型的最新成功激发了一条新的工作路线，研究使过参数模型能够很好地泛化的潜在条件。本文考虑一种框架，其中可能过度参数化的模型包含假特征，即，存在于模型中但不存在于数据中的要素。本文给出了岭回归问题在具有虚假特征的模型错误设定下的推广误差的一个非渐近高概率界。我们的高概率结果表征了由伪特征提供的隐式正则化和由脊参数提供的显式正则化之间的相互作用。我们观察到，虚假特征可以改善泛化误差，即使它们与数据无关。
摘要：Recent successes of massively overparameterized models have inspired a new line of work investigating the underlying conditions that enable overparameterized models to generalize well. This paper considers a framework where the possibly overparametrized model includes fake features, i.e., features that are present in the model but not in the data. We present a non-asymptotic high-probability bound on the generalization error of the ridge regression problem under the model misspecification of having fake features. Our high-probability results characterize the interplay between the implicit regularization provided by the fake features and the explicit regularization provided by the ridge parameter. We observe that fake features may improve the generalization error, even though they are irrelevant to the data.

【9】 Proceedings of the 2nd International Workshop on Reading Music Systems
标题：第二届音乐阅读系统国际研讨会论文集
链接：https://arxiv.org/abs/2212.00380

作者：Jorge Calvo-Zaragoza,Alexander Pacha
备注：Proceedings edited by Jorge Calvo-Zaragoza and Alexander Pacha
摘要：None
摘要：The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 2nd International Workshop on Reading Music Systems, held in Delft on the 2nd of November 2019.

【10】 Proceedings of the 3rd International Workshop on Reading Music Systems
标题：第三届国际音乐阅读系统研讨会论文集
链接：https://arxiv.org/abs/2212.00378

作者：Jorge Calvo-Zaragoza,Alexander Pacha
备注：Proceedings edited by Jorge Calvo-Zaragoza and Alexander Pacha
摘要：阅读音乐系统国际研讨会（英语：International Workshop on Reading Music Systems，简称WoRMS）是一个尝试将开发音乐阅读系统的研究人员（例如光学音乐识别领域的研究人员）与其他研究人员和从业人员（例如图书管理员或音乐学家）联系起来的研讨会。研讨会的相关主题包括但不限于：音乐阅读系统光学音乐识别数据集和性能评估;乐谱图像处理写入者标识;乐谱的创作、编辑、存储和呈现系统;多模态系统产生书面音乐的音乐输入新方法基于Web的音乐信息检索服务;申请和项目;与书面音乐相关的用例。这些是2021年7月23日在阿利坎特举行的第三届阅读音乐系统国际研讨会的会议记录。
摘要：The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

【11】 AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration
标题：AC-Band：一种基于组合Bandit的算法配置方法
链接：https://arxiv.org/abs/2212.00333

作者：Jasmin Brandt,Elias Schede,Viktor Bengs,Björn Haddenhorst,Eyke Hüllermeier,Kevin Tierney
机构：Department of Computer Science, Paderborn University, Germany, Decision and Operation Technologies Group, Bielefeld University, Germany, Institute of Informatics, LMU Munich, Germany, Munich Center for Machine Learning (MCML), Germany
摘要：我们研究了算法配置（AC）问题，在该问题中，人们试图以自动化的方式找到给定目标算法的最优参数配置。最近，在设计满足强有力的理论保证的AC方法方面已经有了重大进展。然而，这些方法的实际性能与最先进的启发式方法之间仍然存在很大的差距。为此，本文提出了一种基于多臂土匪的AC问题的通用方法--AC频段，该方法不仅提供了理论保证，而且具有很强的实用性。我们表明，AC频带比其他AC方法需要明显更少的计算时间，提供了理论保证，同时仍然产生高质量的配置。
摘要：We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

【12】 Generalizing and Improving Jacobian and Hessian Regularization
标题：雅可比正则化和海森正则化的推广与改进
链接：https://arxiv.org/abs/2212.00311

作者：Chenwei Cui,Zehao Yan,Guangshen Liu,Liangfu Lu
机构：Boston University, Ohio State University, Tianjin University
备注：Under review by AISTATS 2023
摘要：None
摘要：Jacobian and Hessian regularization aim to reduce the magnitude of the first and second-order partial derivatives with respect to neural network inputs, and they are predominantly used to ensure the adversarial robustness of image classifiers. In this work, we generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products. The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices. On the other hand, the major challenge for Jacobian and Hessian regularization has been high computational complexity. We introduce Lanczos-based spectral norm minimization to tackle this difficulty. This technique uses a parallelized implementation of the Lanczos algorithm and is capable of effective and stable regularization of large Jacobian and Hessian matrices. Theoretical justifications and empirical evidence are provided for the proposed paradigm and technique. We carry out exploratory experiments to validate the effectiveness of our novel regularization terms. We also conduct comparative experiments to evaluate Lanczos-based spectral norm minimization against prior methods. Results show that the proposed methodologies are advantageous for a wide range of tasks.

【13】 Decentralized Matrix Factorization with Heterogeneous Differential Privacy
标题：具有异质差分隐私的分散矩阵分解
链接：https://arxiv.org/abs/2212.00306

作者：Wentao Hu,Hui Fang
机构：Research Institute for Interdisciplinary Sciences and School of Information Management and Engineering, Shanghai University of Finance and Economics
摘要：传统的矩阵分解依赖于集中收集用户数据进行推荐，这可能会增加隐私泄露的风险，尤其是当推荐者不可信时。现有的差分隐私矩阵分解方法要么假设推荐者是可信的，要么只能为不可信的推荐者提供统一的隐私保护.针对不可信推荐系统，提出了一种新的异构差分私有矩阵分解算法（HDPMF）.据我们所知，我们是第一个在不可信推荐场景下实现异构差分隐私的分散矩阵分解算法。具体地说，我们的框架使用改进的拉伸机制和创新的缩放方案，以实现更好的隐私和准确性之间的权衡。同时，通过合理分配隐私预算，可以捕获用户/项目内同质的隐私偏好，而不同用户/项目间异质的隐私偏好。理论分析表明，HDPMF算法具有严格的隐私保护能力，实验结果表明，HDPMF算法在强隐私保护、高维模型和稀疏数据集场景下具有较好的性能。
摘要：Conventional matrix factorization relies on centralized collection of users' data for recommendation, which might introduce an increased risk of privacy leakage especially when the recommender is untrusted. Existing differentially private matrix factorization methods either assume the recommender is trusted, or can only provide a uniform level of privacy protection for all users and items with untrusted recommender. In this paper, we propose a novel Heterogeneous Differentially Private Matrix Factorization algorithm (denoted as HDPMF) for untrusted recommender. To the best of our knowledge, we are the first to achieve heterogeneous differential privacy for decentralized matrix factorization in untrusted recommender scenario. Specifically, our framework uses modified stretching mechanism with an innovative rescaling scheme to achieve better trade off between privacy and accuracy. Meanwhile, by allocating privacy budget properly, we can capture homogeneous privacy preference within a user/item but heterogeneous privacy preference across different users/items. Theoretical analysis confirms that HDPMF renders rigorous privacy guarantee, and exhaustive experiments demonstrate its superiority especially in strong privacy guarantee, high dimension model and sparse dataset scenario.

【14】 ResNet Structure Simplification with the Convolutional Kernel Redundancy Measure
标题：基于卷积核冗余度量的RESNET结构简化
链接：https://arxiv.org/abs/2212.00272

作者：Hongzhi Zhu,Robert Rohling,Septimiu Salcudean
机构：Department of Electrical and Computer Engineering, University of British Columbia, -, Main Mall, Vancouver, V,T ,Z, BC, Canada., School of Biomedical Engineering, University of British Columbia,-, Health, Sciences Mall, Vancouver, V,T ,Z, BC, Canada.
摘要：深度学习，特别是卷积神经网络，已经加速了计算机视觉的进步，给我们的日常实践带来了变化。此外，标准化的深度学习模块（也称为骨干网络），即，ResNet和EfficientNet，使新的计算机视觉解决方案能够高效快速地开发。然而，深度学习方法仍然存在一些缺陷。最令人关注的问题之一是高存储器和计算成本，使得必须使用专用计算单元（通常为GPU）来进行训练和开发。为此，本文提出了一种基于感知图像差异的卷积核冗余度量化评价方法，用于指导网络结构的简化。将该方法应用于ResNet的胸部X射线图像分类问题，在保持网络性能的同时，参数数目从2300万元减少到12.8万元（减少了99.46元）.
摘要：Deep learning, especially convolutional neural networks, has triggered accelerated advancements in computer vision, bringing changes into our daily practice. Furthermore, the standardized deep learning modules (also known as backbone networks), i.e., ResNet and EfficientNet, have enabled efficient and rapid development of new computer vision solutions. Yet, deep learning methods still suffer from several drawbacks. One of the most concerning problems is the high memory and computational cost, such that dedicated computing units, typically GPUs, have to be used for training and development. Therefore, in this paper, we propose a quantifiable evaluation method, the convolutional kernel redundancy measure, which is based on perceived image differences, for guiding the network structure simplification. When applying our method to the chest X-ray image classification problem with ResNet, our method can maintain the performance of the network and reduce the number of parameters from over $23$ million to approximately $128$ thousand (reducing $99.46\%$ of the parameters).

【15】 PIZZA: A new benchmark for complex end-to-end task-oriented parsing
标题：Pizas：面向复杂端到端任务解析的新基准
链接：https://arxiv.org/abs/2212.00265

作者：Konstantine Arkoudas,Nicolas Guenon des Mesnards,Melanie Rubino,Sandesh Swamy,Saarthak Khanna,Weiqi Sun,Khan Haidar
机构：Alexa AI
备注：Accepted for publication at AMLC 2022
摘要：最近在面向任务的解析方面的很多工作都集中在寻找一个中间地带，在平面槽和意图（intent）与强大的表示（如lambda演算）之间找到一个中间地带，前者没有表达力，但易于注释，后者表达力强，但注释成本高。本文通过引入一个新的数据集来解析比萨饼和饮料订单，继续探索面向任务的解析，其语义不能被平槽和意图捕获。我们在这个数据集上对面向任务解析的深度学习技术进行了广泛的评估，包括不同风格的seq2seq系统和RNNG。这个数据集有两个主要版本，一个是最近引入的话语层次表示法，我们称之为TOP，另一个的目标是可执行表示（EXR）。实验证明，训练解析器直接生成EXR符号不仅一次性解决了实体解析问题，克服了TOP符号的表达限制，而且显著提高了解析精度.
摘要：Much recent work in task-oriented parsing has focused on finding a middle ground between flat slots and intents, which are inexpressive but easy to annotate, and powerful representations such as the lambda calculus, which are expressive but costly to annotate. This paper continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents. We perform an extensive evaluation of deep-learning techniques for task-oriented parsing on this dataset, including different flavors of seq2seq systems and RNNGs. The dataset comes in two main versions, one in a recently introduced utterance-level hierarchical notation that we call TOP, and one whose targets are executable representations (EXR). We demonstrate empirically that training the parser to directly generate EXR notation not only solves the problem of entity resolution in one fell swoop and overcomes a number of expressive limitations of TOP notation, but also results in significantly greater parsing accuracy.

【16】 Shape-Guided Diffusion with Inside-Outside Attention
标题：具有内外注意的形状引导扩散
链接：https://arxiv.org/abs/2212.00210

作者：Dong Huk Park,Grace Luo,Clayton Toste,Samaneh Azadi,Xihui Liu,Maka Karalashvili,Anna Rohrbach,Trevor Darrell
机构：UC Berkeley, Meta AI, The University of Hong Kong, BMW Group, ∗ Denotes equal contribution.
摘要：None
摘要：Shape can specify key object constraints, yet existing text-to-image diffusion models ignore this cue and synthesize objects that are incorrectly scaled, cut off, or replaced with background content. We propose a training-free method, Shape-Guided Diffusion, which uses a novel Inside-Outside Attention mechanism to constrain the cross-attention (and self-attention) maps such that prompt tokens (and pixels) referring to the inside of the shape cannot attend outside the shape, and vice versa. To demonstrate the efficacy of our method, we propose a new image editing task where the model must replace an object specified by its mask and a text prompt. We curate a new ShapePrompts benchmark based on MS-COCO and achieve SOTA results in shape faithfulness, text alignment, and realism according to both quantitative metrics and human preferences. Our data and code will be made available at https://shape-guided-diffusion.github.io.

【17】 Five Properties of Specific Curiosity You Didn't Know Curious Machines Should Have
标题：你不知道好奇心机器应该具备的五个特性
链接：https://arxiv.org/abs/2212.00187

作者：Nadia M. Ady,Roshan Shariff,Johannes Günther,Patrick M. Pilarski
机构：Dept. of Computing Science, University of Alberta &, Johannes G¨unther, Edmonton, Alberta, Canada &, Sony AI, Depts. of Medicine and Computing Science, University of Alberta &, Alberta Machine Intelligence Institute (Amii) &, DeepMind
备注：Submitted to the Journal of Artificial Intelligence Research (JAIR)
摘要：None
摘要：Curiosity for machine agents has been a focus of lively research activity. The study of human and animal curiosity, particularly specific curiosity, has unearthed several properties that would offer important benefits for machine learners, but that have not yet been well-explored in machine intelligence. In this work, we conduct a comprehensive, multidisciplinary survey of the field of animal and machine curiosity. As a principal contribution of this work, we use this survey as a foundation to introduce and define what we consider to be five of the most important properties of specific curiosity: 1) directedness towards inostensible referents, 2) cessation when satisfied, 3) voluntary exposure, 4) transience, and 5) coherent long-term learning. As a second main contribution of this work, we show how these properties may be implemented together in a proof-of-concept reinforcement learning agent: we demonstrate how the properties manifest in the behaviour of this agent in a simple non-episodic grid-world environment that includes curiosity-inducing locations and induced targets of curiosity. As we would hope, our example of a computational specific curiosity agent exhibits short-term directed behaviour while updating long-term preferences to adaptively seek out curiosity-inducing situations. This work, therefore, presents a landmark synthesis and translation of specific curiosity to the domain of machine learning and reinforcement learning and provides a novel view into how specific curiosity operates and in the future might be integrated into the behaviour of goal-seeking, decision-making computational agents in complex environments.

【18】 Layout-aware Dreamer for Embodied Referring Expression Grounding
标题：面向具体化指称表达基础的布局感知Dreamer
链接：https://arxiv.org/abs/2212.00171

作者：Mingxiao Li,Zehao Wang,Tinne Tuytelaars,Marie-Francine Moens
机构： Computer Science Department of KU Leuven, Electrical Engineering Department (ESAT-PSI) of KU Leuven
摘要：在本文中，我们研究了具身指称表达式基础问题，其中代理需要在一个先前看不见的环境中导航，并定位一个由简明的高级自然语言指令描述的远程对象。当面对这样的情形时，人倾向于想象目的地可能看起来像什么，并且基于环境布局的先验知识来探索环境，诸如浴室更可能在卧室附近而不是在厨房附近被发现的事实。我们设计了一个自主Agent--布局感知梦想者（Layout-aware Dreamer，LAD），它包括两个新颖的模块，即布局学习器和目标梦想者来模拟这种认知决策过程。布局学习器学习沿着粗略布局估计的路径推断相邻未探索区域的房间类别分布，这有效地向我们的代理引入了房间到房间转换的布局常识。为了学会有效地探索环境，目标梦想家事先想象目的地。我们的代理在REVERIE数据集的公共排行榜上取得了新的一流性能，在充满挑战的未知测试环境中，与之前的一流性能相比，导航成功率（SR）提高了4.02%，远程接地成功率（RGS）提高了3.43%。代码发布于https://github.com/zehao-wang/LAD
摘要：In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environmental layout, such as the fact that a bathroom is more likely to be found near a bedroom than a kitchen. We have designed an autonomous agent called Layout-aware Dreamer (LAD), including two novel modules, that is, the Layout Learner and the Goal Dreamer to mimic this cognitive decision process. The Layout Learner learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation, which effectively introduces layout common sense of room-to-room transitions to our agent. To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand. Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art. The code is released at https://github.com/zehao-wang/LAD

【19】 Answering Private Linear Queries Adaptively using the Common Mechanism
标题：使用公共机制自适应地回答私密线性查询
链接：https://arxiv.org/abs/2212.00135

作者：Yingtai Xiao,Guanhong Wang,Danfeng Zhang,Daniel Kifer
机构：Penn State,University of Maryland, College Park
摘要：当通过隐私过滤器分析机密数据时，数据科学家通常需要决定哪些查询将最好地支持他们的预期分析。例如，分析者可能希望研究由机制M1产生的数据集中的噪声双向边际。但是，如果数据相对稀疏，分析师可能会选择检查由机制M2产生的噪声单向边际。由于使用M1还是M2的选择是依赖于数据的，因此典型的有差别的隐私工作流是首先将隐私损失预算rho分成两部分：ρ 1和ρ 2，然后使用第一部分ρ 1来确定使用哪种机制，并且使用剩余部分ρ 2来从所选机制获得噪声应答。从某种意义上说，第一步似乎是浪费，因为它拿走了本可以用于使查询答案更准确的部分隐私损失预算。在这篇文章中，我们考虑了在M1和M2之间的选择是否可以在不浪费任何隐私损失预算的情况下执行的问题。对于线性查询，我们提出了一种将M1和M2分解为三部分的方法：（1）捕获它们的共享信息的机制M *，（2）捕获M1特有的信息的机制M1 "，（3）捕获M2特有的信息的机制M2"。一起运行M * 和M1 "完全等同于运行M1（在查询回答准确度和总隐私成本ρ方面）。类似地，一起运行M * 和M2 "完全等同于运行M2。由于无论如何都将使用M *，所以分析者可以使用其输出来决定随后是运行M1 "（从而重新创建由M1支持的分析）还是运行M2"（重新创建由M2支持的分析），而不会浪费隐私损失预算。
摘要：When analyzing confidential data through a privacy filter, a data scientist often needs to decide which queries will best support their intended analysis. For example, an analyst may wish to study noisy two-way marginals in a dataset produced by a mechanism M1. But, if the data are relatively sparse, the analyst may choose to examine noisy one-way marginals, produced by a mechanism M2 instead. Since the choice of whether to use M1 or M2 is data-dependent, a typical differentially private workflow is to first split the privacy loss budget rho into two parts: rho1 and rho2, then use the first part rho1 to determine which mechanism to use, and the remainder rho2 to obtain noisy answers from the chosen mechanism. In a sense, the first step seems wasteful because it takes away part of the privacy loss budget that could have been used to make the query answers more accurate. In this paper, we consider the question of whether the choice between M1 and M2 can be performed without wasting any privacy loss budget. For linear queries, we propose a method for decomposing M1 and M2 into three parts: (1) a mechanism M* that captures their shared information, (2) a mechanism M1' that captures information that is specific to M1, (3) a mechanism M2' that captures information that is specific to M2. Running M* and M1' together is completely equivalent to running M1 (both in terms of query answer accuracy and total privacy cost rho). Similarly, running M* and M2' together is completely equivalent to running M2. Since M* will be used no matter what, the analyst can use its output to decide whether to subsequently run M1'(thus recreating the analysis supported by M1) or M2'(recreating the analysis supported by M2), without wasting privacy loss budget.

【20】 Evidential Conditional Neural Processes
标题：证据条件神经过程
链接：https://arxiv.org/abs/2212.00131

作者：Deep Shankar Pandey,Qi Yu
机构：Rochester Institute of Technology
备注：To appear in AAAI2023 Conference
摘要：条件神经过程（CNP）模型家族通过实现更好的可扩展性和竞争力的预测性能，为解决Few-Shot问题提供了一个有前途的方向。然而，当前的CNP模型仅捕获对目标数据点所做预测的总体不确定性。它们缺乏对不确定性的不同来源的系统的细粒度量化，而这些不确定性对于在Few-Shot设置下的模型训练和决策是必不可少的。提出了证据条件神经过程（ECNP），通过证据学习，用更丰富的层次贝叶斯结构代替CNP使用的标准高斯分布，实现认知-推理不确定性分解。证据层次结构还导致理论上证明的对噪声训练任务的鲁棒性。对所提出的ECNP的理论分析建立了与CNP的关系，同时对证据参数的作用提供了更深入的见解。在合成数据和真实数据上进行的大量实验证明了所提出的模型在各种Few-Shot设置下的有效性。
摘要：The Conditional Neural Process (CNP) family of models offer a promising direction to tackle few-shot problems by achieving better scalability and competitive predictive performance. However, the current CNP models only capture the overall uncertainty for the prediction made on a target data point. They lack a systematic fine-grained quantification on the distinct sources of uncertainty that are essential for model training and decision-making under the few-shot setting. We propose Evidential Conditional Neural Processes (ECNP), which replace the standard Gaussian distribution used by CNP with a much richer hierarchical Bayesian structure through evidential learning to achieve epistemic-aleatoric uncertainty decomposition. The evidential hierarchical structure also leads to a theoretically justified robustness over noisy training tasks. Theoretical analysis on the proposed ECNP establishes the relationship with CNP while offering deeper insights on the roles of the evidential parameters. Extensive experiments conducted on both synthetic and real-world data demonstrate the effectiveness of our proposed model in various few-shot settings.

【21】 Incentivising cooperation by rewarding the weakest member
标题：通过奖励最弱的成员来激励合作
链接：https://arxiv.org/abs/2212.00119

作者：Jory Schossau,Bamshad Shirmohammadi,Arend Hintze
机构： BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, Michigan, United States of America, Department of MicroData Analytics, Dalarna University, Falun, Dalarna, Sweden
备注：11 pages, 4 figures
摘要：代表人类彼此行动的自治代理在许多社会领域中变得越来越普遍，例如客户服务、运输和医疗保健。在这样的社会情境中，贪婪策略会降低所有代理的积极结果，例如导致高速公路上的走走停停的交通，或导致通信信道上的拒绝服务。相反，我们希望自主决策以实现高效绩效，同时考虑团队的公平性以避免这些陷阱。不幸的是，在复杂的情况下，为自私的策略设计机器学习目标要比为公平的行为设计机器学习目标容易得多。在这里，我们提出了一种简单的方法来奖励进化和强化学习领域中的代理组的表现最弱的成员。我们展示了这如何产生更公平的行为，同时也最大化了个体的结果，我们还展示了群体水平选择和包容适应理论的生物选择机制的关系。
摘要：Autonomous agents that act with each other on behalf of humans are becoming more common in many social domains, such as customer service, transportation, and health care. In such social situations greedy strategies can reduce the positive outcome for all agents, such as leading to stop-and-go traffic on highways, or causing a denial of service on a communications channel. Instead, we desire autonomous decision-making for efficient performance while also considering equitability of the group to avoid these pitfalls. Unfortunately, in complex situations it is far easier to design machine learning objectives for selfish strategies than for equitable behaviors. Here we present a simple way to reward groups of agents in both evolution and reinforcement learning domains by the performance of their weakest member. We show how this yields ``fairer'' more equitable behavior, while also maximizing individual outcomes, and we show the relationship to biological selection mechanisms of group-level selection and inclusive fitness theory.

【22】 Towards True Lossless Sparse Communication in Multi-Agent Systems
标题：多智能体系统中走向真正意义上的无损稀疏通信
链接：https://arxiv.org/abs/2212.00115

作者：Seth Karten,Mycal Tucker,Siva Kailas,Katia Sycara
机构：Carnegie Mellon University, Massachusetts Institute of Technology
备注：12 pages, 6 figures
摘要：通信使代理人能够合作以实现他们的目标。学习何时通信，即当带宽受限时，稀疏（在时间上）通信以及向谁发送消息尤其重要。然而，近年来在学习稀疏个体化通信方面的工作，在训练过程中遭受了高方差，其中减少通信是以减少奖励为代价的，特别是在合作任务中。本文提出了一种基于信息最大化门控稀疏多Agent通信（Information Maximizing Gated Sparse Multi-Agent Communication，IMGS-MAC）的无损稀疏通信方法.我们的模型使用两个个性化的正则化目标，信息最大化自动编码器和稀疏通信损耗，以创建信息和稀疏通信。我们通过对非稀疏运行中的消息进行直接因果分析来评估所学习的通信语言，以确定允许zero-shot稀疏的无损稀疏预算的范围，以及将询问奖励损失的稀疏预算的范围，该奖励损失通过我们所学习的具有Few-Shot稀疏的选通函数来最小化。为了证明我们的结果的有效性，我们在合作的多智能体任务中进行了实验，其中通信是成功的关键。我们使用连续和离散消息来评估我们的模型。我们将重点分析各种消融，以显示消息表示的效果，包括它们的属性，以及我们的模型的无损性能。
摘要：Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model.

【23】 Reservoir Computing-based Multi-Symbol Equalization for PAM 4 Short-reach Transmission
标题：基于水库计算的PAM-4短距离传输多符号均衡
链接：https://arxiv.org/abs/2212.00738

作者：Yevhenii Osadchuk,Ognjen Jovanovic,Darko Zibar,Francesco Da Ros
机构：DTU Electro, Technical University of Denmark, DTU, Kongens Lyngby, Denmark
备注：Conference, 2 pages
摘要：提出了一种基于计算机的（RC）多符号均衡技术，用于32-GBd PAM 4传输。在输出处具有17个符号的RC实现了与单输出情况相比乘法/符号的数量级降低，同时保持简单训练。
摘要：We propose spectrum-sliced reservoir computer-based (RC) multi-symbol equalization for 32-GBd PAM4 transmission. RC with 17 symbols at the output achieves an order of magnitude reduction in multiplications/symbol versus single output case while maintaining simple training.

【24】 An exponentially-growing family of universal quantum circuits
标题：指数级增长的通用量子电路家族
链接：https://arxiv.org/abs/2212.00736

作者：Mohammad Kordzanganeh,Pavel Sekatski,Leonid Fedichkin,Alexey Melnikov
机构：Terra Quantum AG, Kornhausstrasse , St. Gallen, Switzerland
备注：13 pages, 7 figures
摘要：量子机器学习已经成为一个越来越感兴趣的领域，但有一定的理论和硬件特定的限制。值得注意的是，梯度消失或贫瘠高原的问题，使得训练不可能用于具有高量子位计数的电路，这对数据科学家可以用来解决问题的量子位的数量施加了限制。独立地，角度嵌入的监督量子神经网络被示出产生截断的傅里叶级数，其阶数直接取决于两个因素：编码的深度以及应用编码的并行量子位的数量。傅里叶级数的次数限制了模型的表达能力。这项工作引入了两种新的结构，其傅立叶次数呈指数增长：顺序和并行指数量子机器学习体系结构。这是通过在编码时有效地使用可用的希尔伯特空间来实现的，从而增加了量子编码的表现力。因此，指数增长允许停留在低量子比特极限，以创建避免贫瘠平台的高表达电路。在实际应用中，并行指数结构在一维测试问题中的性能优于现有的线性结构，其最终均方误差值最多可降低44.7%。此外，在一个囚禁离子量子处理单元上也验证了该技术的可行性。
摘要：Quantum machine learning has become an area of growing interest but has certain theoretical and hardware-specific limitations. Notably, the problem of vanishing gradients, or barren plateaus, renders the training impossible for circuits with high qubit counts, imposing a limit on the number of qubits that data scientists can use for solving problems. Independently, angle-embedded supervised quantum neural networks were shown to produce truncated Fourier series with a degree directly dependent on two factors: the depth of the encoding, and the number of parallel qubits the encoding is applied to. The degree of the Fourier series limits the model expressivity. This work introduces two new architectures whose Fourier degrees grow exponentially: the sequential and parallel exponential quantum machine learning architectures. This is done by efficiently using the available Hilbert space when encoding, increasing the expressivity of the quantum encoding. Therefore, the exponential growth allows staying at the low-qubit limit to create highly expressive circuits avoiding barren plateaus. Practically, parallel exponential architecture was shown to outperform the existing linear architectures by reducing their final mean square error value by up to 44.7% in a one-dimensional test problem. Furthermore, the feasibility of this technique was also shown on a trapped ion quantum processing unit.

【25】 On the Effective Usage of Priors in RSS-based Localization
标题：基于RSS的本地化中先验知识的有效利用
链接：https://arxiv.org/abs/2212.00728

作者：Çağkan Yapar,Fabian Jaensch,Ron Levie,Giuseppe Caire
机构：Technische Universität Berlin, de†Technion – Israel Institute of Technology
摘要：本文研究了密集城市环境中的定位问题。在这样的环境中，由于存在像建筑物这样的障碍物，在要定位的接收机（Rx）和卫星之间的视线（LOS）链路的可能性低，因此全球导航卫星系统不能提供良好的精度。因此，不得不求助于能够在非视距（NLOS）条件下可靠地操作的其它技术。最近，我们提出了一种基于接收信号强度（RSS）指纹和卷积神经网络的算法LocUNet，并证明了其相对于广泛采用的k最近邻（kNN）算法和基于到达时间（ToA）测距的最新定位方法的最新定位性能。在当前的工作中，我们首先认识到LocUNet从训练数据中学习Rx位置或Rx和发射器（Tx）关联偏好的潜在先验分布的能力，并将其高性能归因于这些。相反，我们证明了基于概率方法的经典方法，可以大大受益于这种先验信息的适当结合。通过与理论最优解的比较，证明了LocUNet算法在很多情况下都接近最优解.
摘要：In this paper, we study the localization problem in dense urban settings. In such environments, Global Navigation Satellite Systems fail to provide good accuracy due to low likelihood of line-of-sight (LOS) links between the receiver (Rx) to be located and the satellites, due to the presence of obstacles like the buildings. Thus, one has to resort to other technologies, which can reliably operate under non-line-of-sight (NLOS) conditions. Recently, we proposed a Received Signal Strength (RSS) fingerprint and convolutional neural network-based algorithm, LocUNet, and demonstrated its state-of-the-art localization performance with respect to the widely adopted k-nearest neighbors (kNN) algorithm, and to state-of-the-art time of arrival (ToA) ranging-based methods. In the current work, we first recognize LocUNet's ability to learn the underlying prior distribution of the Rx position or Rx and transmitter (Tx) association preferences from the training data, and attribute its high performance to these. Conversely, we demonstrate that classical methods based on probabilistic approach, can greatly benefit from an appropriate incorporation of such prior information. Our studies also numerically prove LocUNet's close to optimal performance in many settings, by comparing it with the theoretically optimal formulations.

【26】 Target-centered Subject Transfer Framework for EEG Data Augmentation
标题：以目标为中心的脑电信号数据增强主题转移框架
链接：https://arxiv.org/abs/2212.00723

作者：Kang Yin,Byeong-Hoo Lee,Byoung-Hee Kwon,Jeong-Hyun Cho
机构：Dept. Artificial Intelligence, Korea University, Seoul, Republic of Korea, Dept. Brain and Cognitive Engineering
摘要：数据增强方法被广泛地探索用于增强解码脑电信号。在与被试无关的脑机接口系统中，利用领域自适应和泛化技术，将源被试的数据分布进行扩展，使其与目标被试的数据分布相匹配。然而，先前的工作或者引入噪声（例如，通过添加噪声或用随机噪声生成）或修改目标数据，因此不能很好地描述目标数据分布并妨碍进一步的分析。本文提出了一种以目标为中心的主题迁移框架作为数据扩充方法。首先构造源数据的子集以最大化源-目标相关性。然后，应用生成模型将数据转换到目标域。该框架通过增加额外的真实数据代替噪声，丰富了目标域的可解释性。与其他数据扩充方法相比，该方法表现出更好的性能。通过大量的实验验证了该方法的有效性和鲁棒性，为进一步的研究提供了有力的工具。
摘要：Data augmentation approaches are widely explored for the enhancement of decoding electroencephalogram signals. In subject-independent brain-computer interface system, domain adaption and generalization are utilized to shift source subjects' data distribution to match the target subject as an augmentation. However, previous works either introduce noises (e.g., by noise addition or generation with random noises) or modify target data, thus, cannot well depict the target data distribution and hinder further analysis. In this paper, we propose a target-centered subject transfer framework as a data augmentation approach. A subset of source data is first constructed to maximize the source-target relevance. Then, the generative model is applied to transfer the data to target domain. The proposed framework enriches the explainability of target domain by adding extra real data, instead of noises. It shows superior performance compared with other data augmentation methods. Extensive experiments are conducted to verify the effectiveness and robustness of our approach as a prosperous tool for further research.

【27】 ML framework for global river flood predictions based on the Caravan dataset
标题：基于Caravan数据集的全球河流洪水预报的ML框架
链接：https://arxiv.org/abs/2212.00719

作者：Ioanna Bouri,Manu Lahariya,Omer Nivron,Enrique Portales Julia,Dietmar Backes,Piotr Bilinski,Guy Schumann
摘要：在最初的72小时内对河流洪水进行可靠的预测可以减少危害，因为应急机构有足够的时间在现场准备和部署救援。在大多数高收入国家，这种河流洪水预测模型已经存在，而且运行相对良好。但是，由于数据有限，低收入国家缺乏这些模型。在这里，我们提供了第一个基于新发布的Caravan数据集的全球河流洪水预测框架。该框架旨在为未来全球河流洪水预测研究提供基准。为了支持推广性声明，我们包括自定义数据评估拆分。此外，我们提出并评估了一种新的双路径LSTM架构（2 P-LSTM），针对三种基线模型。最后，我们评估了非洲和亚洲不同位置上生成的模型，这些位置不是Caravan数据集的一部分。
摘要：Reliable prediction of river floods in the first 72 hours can reduce harm because emergency agencies have sufficient time to prepare and deploy for help at the scene. Such river flood prediction models already exist and perform relatively well in most high-income countries. But, due to the limited availability of data, these models are lacking in low-income countries. Here, we offer the first global river flood prediction framework based on the newly published Caravan dataset. Our framework aims to serve as a benchmark for future global river flood prediction research. To support generalizability claims we include custom data evaluation splits. Further, we propose and evaluate a novel two-path LSTM architecture (2P-LSTM) against three baseline models. Finally, we evaluate the generated models on different locations in Africa and Asia that were not part of the Caravan dataset.

【28】 Penalized Langevin and Hamiltonian Monte Carlo Algorithms for Constrained Sampling
标题：约束抽样的惩罚朗之万法和哈密顿法
链接：https://arxiv.org/abs/2212.00570

作者：Mert Gürbüzbalaban,Yuanhan Hu,Lingjiong Zhu
机构：Department of Management Science and Information Systems, Rutgers Business School, Piscataway, NJ , United States of America, Department of Mathematics, Florida State University, Tallahassee, FL , United States of America
摘要：我们考虑约束抽样问题，其中目标是从分布$\pi（x）\propto e^{-f（x）}$中抽样，$x$被约束在凸体$\mathcal{C}\subset \mathbb{R}^d$上。受最优化中惩罚方法的启发，提出了惩罚朗之万动力学（PLD）和惩罚哈密顿蒙特卡罗（PHMC）方法，通过引入惩罚函数，将约束抽样问题转化为无约束抽样问题。当$f$是平滑的并且梯度可用时，我们示出了PLD对目标进行采样直到$\varepsilon$-误差的$\tilde{\mathcal{O}}（d/\varepsilon^{10}）$迭代复杂度，其中误差根据总变化距离来测量，并且$\tilde{\mathcal{O}}（\cdot）$隐藏了一些对数因子。对于PHMC，当f的Hessian为Lipschitz且C的边界足够光滑时，我们将此结果改进为。据我们所知，这些是约束抽样设置下的Hamiltonian Monte Carlo方法的第一个收敛速度结果，该方法可以处理非凸$f$，并可以在现有的确定性梯度方法中提供最好的维数依赖性保证。然后我们考虑无偏随机梯度可用的设置。我们提出了PSGLD和PSGHMC，可以处理随机梯度，而无需Metropolis-Hasting校正步骤。当f是强凸且光滑的时，我们得到了在2-Wasserstein距离下的迭代复杂度分别为{\f10}（d/\varepsilon^{18}）$和{\f10}（d/\varepsilon^{39}）$。对于更一般的情况，当$f$是光滑和非凸的，我们也提供了有限时间性能界和迭代复杂性结果。最后，我们在贝叶斯LASSO回归和贝叶斯约束深度学习问题上测试了我们的算法.
摘要：We consider the constrained sampling problem where the goal is to sample from a distribution $\pi(x)\propto e^{-f(x)}$ and $x$ is constrained on a convex body $\mathcal{C}\subset \mathbb{R}^d$. Motivated by penalty methods from optimization, we propose penalized Langevin Dynamics (PLD) and penalized Hamiltonian Monte Carlo (PHMC) that convert the constrained sampling problem into an unconstrained one by introducing a penalty function for constraint violations. When $f$ is smooth and the gradient is available, we show $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in terms of the total variation distance and $\tilde{\mathcal{O}}(\cdot)$ hides some logarithmic factors. For PHMC, we improve this result to $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ when the Hessian of $f$ is Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our knowledge, these are the first convergence rate results for Hamiltonian Monte Carlo methods in the constrained sampling setting that can handle non-convex $f$ and can provide guarantees with the best dimension dependency among existing methods with deterministic gradients. We then consider the setting where unbiased stochastic gradients are available. We propose PSGLD and PSGHMC that can handle stochastic gradients without Metropolis-Hasting correction steps. When $f$ is strongly convex and smooth, we obtain an iteration complexity of $\tilde{\mathcal{O}}(d/\varepsilon^{18})$ and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ respectively in the 2-Wasserstein distance. For the more general case, when $f$ is smooth and non-convex, we also provide finite-time performance bounds and iteration complexity results. Finally, we test our algorithms on Bayesian LASSO regression and Bayesian constrained deep learning problems.

【29】 Are you using test log-likelihood correctly?
标题：您是否正确使用了测试对数似然率？
链接：https://arxiv.org/abs/2212.00219

作者：Sameer K. Deshpande,Soumya Ghosh,Tin D. Nguyen,Tamara Broderick
机构：University of Wisconsin–Madison, MIT-IBM Watson AI Lab, IBM Research, Massachusetts Institute of Technology
备注：Presented at the ICBINB Workshop at NeurIPS 2022
摘要：对数似然检验常用于比较同一数据的不同模型和拟合同一概率模型的不同近似推理算法。我们提出了简单的例子，说明如何比较基于检验对数似然可能会抵触比较根据其他目标。具体地说，我们的例子表明：（i）基于检验对数似然比较的预测准确性结论可能与基于其他分布量（如均值）的结论不一致;以及（ii）获得更高测试对数似然的近似贝叶斯推理算法不需要也产生更精确的后验近似。
摘要：Test log-likelihood is commonly used to compare different models of the same data and different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on other distributional quantities like means; and (ii) that approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations.

【30】 Novel Modelling Strategies for High-frequency Stock Trading Data
标题：一种新的高频股票交易数据建模策略
链接：https://arxiv.org/abs/2212.00148

作者：Xuekui Zhang,Yuying Huang,Ke Xu,Li Xing
机构：Math&Stat Department at, University of Victoria, Victoria, Canada, Full list of author information is, available at the end of the article, †Equal contributor
备注：28 pages, 5 tables, 5 figures
摘要：证券交易所的全电子自动化最近变得流行起来，产生了高频的日内数据，并推动了近实时价格预测方法的发展。机器学习算法被广泛应用于股票的中间价预测。处理原始数据作为预测模型的输入（例如，数据细化和特征工程）可以主要影响预测方法的性能。然而，研究者很少讨论这个话题。这促使我们提出了三种新的建模策略来处理原始数据。我们通过分析道琼斯30种成分股的高频数据来说明我们的新建模策略是如何提高预测性能的。在这些实验中，我们的策略通常会导致预测的统计学显著改善。这三种策略使SVM模型的F1得分分别提高了0.056、0.087和0.016。
摘要：Full electronic automation in stock exchanges has recently become popular, generating high-frequency intraday data and motivating the development of near real-time price forecasting methods. Machine learning algorithms are widely applied to mid-price stock predictions. Processing raw data as inputs for prediction models (e.g., data thinning and feature engineering) can primarily affect the performance of the prediction methods. However, researchers rarely discuss this topic. This motivated us to propose three novel modelling strategies for processing raw data. We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks. In these experiments, our strategies often lead to statistically significant improvement in predictions. The three strategies improve the F1 scores of the SVM models by 0.056, 0.087, and 0.016, respectively.

【31】 Feature Selection with Distance Correlation
标题：基于距离相关的特征选择
链接：https://arxiv.org/abs/2212.00046

作者：Ranit Das,Gregor Kasieczka,David Shih
机构：NHETC, Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ , USA, Institut f¨ur Experimentalphysik, Universit¨at Hamburg, Hamburg, Germany, Center for Data and Computing in Natural Sciences (CDCS), Hamburg, Germany
备注：14 pages, 8 figures, 3 tables
摘要：选择数据的哪些属性用作多元决策算法的输入--也称为特征选择--是用机器学习解决任何问题的重要步骤。虽然有一个明显的趋势是在大量相对未处理的输入上训练复杂的深度网络（所谓的自动化特征工程），但对于物理学中的许多任务来说，理论上动机良好和理解良好的特征集已经存在。使用这些功能可以带来许多好处，包括更好的可解释性、减少训练和运行时间以及增强的稳定性和健壮性。提出了一种基于距离相关的特征选择方法（DisCo），并在boosted top-tagging和$W$-tagging任务中验证了该方法的有效性.使用我们的方法从一组超过7，000个能量流多项式中选择特征，我们表明，我们可以通过仅使用十个特征和少两个数量级的模型参数来匹配更深层次架构的性能。
摘要：Choosing which properties of the data to use as input to multivariate decision algorithms -- a.k.a. feature selection -- is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on Distance Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted top- and $W$-tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.

【32】 Random Copolymer inverse design system orienting on Accurate discovering of Antimicrobial peptide-mimetic copolymers
标题：面向抗菌肽模拟共聚物精确发现的随机共聚物逆向设计系统
链接：https://arxiv.org/abs/2212.00023

作者：Tianyu Wu,Yang Tang
机构：!
摘要：抗生素耐药性是最大的健康问题之一，尤其是在当前COVID-19大流行的时期。拟抗菌肽共聚物因其独特的破膜杀菌机制而备受关注，迫切需要寻找具有广谱抗菌功效和低毒性的潜在候选化合物。人工智能在小分子或生物技术药物设计中表现出了显著的性能，但高分子空间的高维性和有限的实验数据限制了现有方法在共聚物设计中的应用。在此，我们开发了一个通用的无规共聚物反设计系统，通过多模型共聚物表示学习、知识提取和强化学习。该系统通过从多模态共聚物表征中提取各种化学信息，实现了用Few-Shot数据进行高精度的抗菌活性预测。通过知识提取预训练一个支架-装饰器生成模型，将共聚物空间大大收缩到现有数据的邻近空间进行探索。因此，我们的强化学习算法可以适应特定支架和性质或结构要求的定制生成。我们将我们的系统应用于收集的抗菌肽模拟共聚物数据，发现了具有所需性质的候选共聚物。
摘要：Antimicrobial resistance is one of the biggest health problem, especially in the current period of COVID-19 pandemic. Due to the unique membrane-destruction bactericidal mechanism, antimicrobial peptide-mimetic copolymers are paid more attention and it is urgent to find more potential candidates with broad-spectrum antibacterial efficacy and low toxicity. Artificial intelligence has shown significant performance on small molecule or biotech drugs, however, the higher-dimension of polymer space and the limited experimental data restrict the application of existing methods on copolymer design. Herein, we develop a universal random copolymer inverse design system via multi-model copolymer representation learning, knowledge distillation and reinforcement learning. Our system realize a high-precision antimicrobial activity prediction with few-shot data by extracting various chemical information from multi-modal copolymer representations. By pre-training a scaffold-decorator generative model via knowledge distillation, copolymer space are greatly contracted to the near space of existing data for exploration. Thus, our reinforcement learning algorithm can be adaptive for customized generation on specific scaffolds and requirements on property or structures. We apply our system on collected antimicrobial peptide-mimetic copolymers data, and we discover candidate copolymers with desired properties.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递