Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.
Element-wise pruning (defined per layer) using magnitude thresholding, sensitivity thresholding, and target sparsity level.
Structured pruning:
Convolution: 2D (kernel-wise), 3D (filter-wise), 4D (layer-wise), and channel-wise structured pruning.
Fully-connected: column-wise and row-wise structured pruning.
Filter-ranking and pruning is implemented, and can be easily extended to support ranking of channels or other structures.
Distiller is designed to be extended to support new structures (e.g. block pruning).
Pruned elements are automatically disconnected from the network and do not participate in both forward and backward passes.
Model thinning (removal of layers, filters, and channels) is partially supported and will be extended with future PyTorch versions. You can export thinned models to inference frameworks using ONNX export.
L1-norm element-wise regularization, and Group Lasso regularization for all of the pruning structures (2D, 3D, etc.).
Flexible scheduling of pruning, regularization, and learning rate decay (compression scheduling).
One-shot and iterative pruning (and fine-tuning) are supported.
Automatic gradual pruning schedule is supported for element-wise pruning, and can be extended to support structures.
The compression schedule is expressed in a YAML file so that a single file captures the details of experiments. Thisdependency injection design decouples the Distiller scheduler and library from future extensions of algorithms.
8-bit quantization is implemented and lower-precision quantization methods will be added soon.
Export statistics summaries using Pandas dataframes, which makes it easy to slice, query, display and graph the data.
A set of Jupyter notebooks to plan experiments and analyze compression results. The graphs and visualizations you see on this page originate from the included Jupyter notebooks.
Take a look at this notebook, which compares visual aspects of dense and sparse Alexnet models.
This notebook creates performance indicator graphs from model data.
Sample implementations of published research papers, using library-provided building blocks. See the research papers discussions in our model-zoo.
Element-wise and filter-wise pruning sensitivity analysis (using L1-norm thresholding). Examine the data from some of the networks we analyzed, using this notebook.
Logging to the console, text file and TensorBoard-formatted file.