Performance

In this documentation, we present evaluation results for applying various model compression methods for ResNet and MobileNet models on the ImageNet classification task, including channel pruning, weight sparsification, and uniform quantization.

We adopt ChannelPrunedLearner to shrink the number of channels for convolutional layers to reduce the computation complexity. Instead of using the same pruning ratio for all layers, we utilize the DDPG algorithm as the RL agent to iteratively search for the optimal pruning ratio of each layer. After obtaining the optimal pruning ratios, group fine-tuning is adopted to further improve the compressed model's accuracy, as demonstrated below:

Model Pruning Ratio Uniform RL-based RL-based + Group Fine-tuning
MobileNet-v1 50% 66.5% 67.8% (+1.3%) 67.9% (+1.4%)
MobileNet-v1 60% 66.2% 66.9% (+0.7%) 67.0% (+0.8%)
MobileNet-v1 70% 64.4% 64.5% (+0.1%) 64.8% (+0.4%)
Mobilenet-v1 80% 61.4% 61.4% (+0.0%) 62.2% (+0.8%)

Note: The original uncompressed MobileNet-v1's top-1 accuracy is 70.89%.

We adopt WeightSparseLearner to introduce the sparsity constraint so that a large portion of model weights can be removed, which leads to smaller model and lower FLOPs for inference. Comparing with the original algorithm proposed in (Zhu & Gupta, 2017), we also incorporate network distillation and reinforcement learning algorithms to further improve the compressed model's accuracy, as shown in the table below:

Model Sparsity (Zhu & Gupta, 2017) RL-based
MobileNet-v1 50% 69.5% 70.5% (+1.0%)
MobileNet-v1 75% 67.7% 68.5% (+0.8%)
MobileNet-v1 90% 61.8% 63.4% (+1.6%)
MobileNet-v1 95% 53.6% 56.8% (+3.2%)

Note: The original uncompressed MobileNet-v1's top-1 accuracy is 70.89%.

We adopt UniformQuantTFLearner to uniformly quantize model weights from 32-bit floating-point numbers to 8-bit fixed-point numbers. The resulting model can be converted into the TensorFlow Lite format for deployment on mobile devices. In the following two tables, we show that 8-bit quantized models can be as accurate as (or even better than) the original 32-bit ones, and the inference time can be significantly reduced after quantization.

Model Top-1 Acc. (32-bit) Top-5 Acc. (32-bit) Top-1 Acc. (8-bit) Top-5 Acc. (8-bit)
ResNet-18 70.28% 89.38% 70.31% (+0.03%) 89.40% (+0.02%)
ResNet-50 75.97% 92.88% 76.01% (+0.04%) 92.87% (-0.01%)
MobileNet-v1 70.89% 89.56% 71.29% (+0.40%) 89.79% (+0.23%)
MobileNet-v2 71.84% 90.60% 72.26% (+0.42%) 90.77% (+0.17%)
Model Hardware CPU Time (32-bit) Time (8-bit) Speed-up
MobileNet-v1 XiaoMi 8 SE Snapdragon 710 156.33 62.60 2.50\times
MobileNet-v1 XiaoMI 8 Snapdragon 845 124.53 56.12 2.22\times
MobileNet-v1 Huawei P20 Kirin 970 152.54 68.43 2.23\times
MobileNet-v2 XiaoMi 8 SE Snapdragon 710 153.18 57.55 2.66\times
MobileNet-v2 XiaoMi 8 Snapdragon 845 120.59 49.04 2.46\times
MobileNet-v2 Huawei P20 Kirin 970 226.61 61.38 3.69\times
  • All the reported time are in milliseconds.