Official code and model for the paper:
It also contains an implementation of the following papers:
- Binary Weight Network, with (W,A,G)=(1,32,32).
- Trained Ternary Quantization, with (W,A,G)=(t,32,32).
- Binarized Neural Networks, with (W,A,G)=(1,1,32).
Alternative link to this page: http://dorefa.net
This is a good set of baselines for research in model quantization. These quantization techniques, when applied on AlexNet, achieves the following ImageNet performance in this implementation:
Model | Bit Width (weights, activations, gradients) |
Top 1 Validation Error 1 |
---|---|---|
Full Precision2 | 32,32,32 | 40.3% |
TTQ | t,32,32 | 42.0% |
BWN | 1,32,32 | 44.3% ⬇️ |
BNN | 1,1,32 | 51.5% ⬇️ |
DoReFa | 8,8,8 | 42.0% ⬇️ |
DoReFa | 1,2,32 | 46.6% |
DoReFa | 1,2,6 | 46.8% ⬇️ |
DoReFa | 1,2,4 | 54.0% |
1: These numbers were obtained by training on 8 GPUs with a total batch size of 256 (otherwise the performance may become slightly different). The DoReFa-Net models reach slightly better performance than our paper, due to more sophisticated augmentations.
2: Not directly comparable with the original AlexNet. Check out ../ImageNetModels for a more faithful implementation of the original AlexNet.
DoReFa-Net works on mobile and FPGA! We hosted a demo at CVPR16 on behalf of Megvii, Inc, running a 1/4-VGG size DoReFa-Net on a phone and a half-VGG size DoReFa-Net on an FPGA, in real time. DoReFa-Net and its variants have been deployed widely in Megvii's embeded products.
This code release is meant for research purpose. We're not planning to release our C++ runtime for bit-operations.
In this implementation, quantized operations are all performed through tf.float32
. They don't make your network faster.
-
Install TensorFlow>=1.7, tensorpack and scipy.
-
Look at the docstring in
*-dorefa.py
to see detailed usage and performance.
Pretrained model for (1,4,32)-ResNet18 and (1,2,6)-AlexNet are available at tensorpack model zoo. They're provided in the format of numpy dictionary. The binary-weight 4-bit-activation ResNet-18 model has 59.2% top-1 validation accuracy.
Please use github issues for any issues related to the code itself. Please send email to the authors for general questions related to the paper.
If you use our code or models in your research, please cite:
@article{zhou2016dorefa,
author = {Shuchang Zhou and Yuxin Wu and Zekun Ni and Xinyu Zhou and He Wen and Yuheng Zou},
title = {DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients},
journal = {CoRR},
volume = {abs/1606.06160},
year = {2016},
url = {http://arxiv.org/abs/1606.06160},
}