In this work we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS use structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieve computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) leveraging simple math to re-arrange the order of operations in REDS computational graph to take advantage of the data cache. REDS support conventional deep networks frequently deployed on the edge and provide computational benefits even for small and simple networks. We evaluate REDS on six benchmark architectures trained on the Google Speech Commands, Fashion-MNIST and CIFAR10 datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40 microseconds of models deployed on Arduino Nano 33 BLE Sense through Tensorflow Lite for Microcontrollers.
Install the software packages required for reproducing the experiment by running the
command: pip3 install -r requirements.txt
inside the project folder.
Run the setup.sh
script file to create the hierarchy of folders used to store the results of the experiments.
Install the GUROBI solver and obtain a license (see free academic license). To link the license and the solver to the programs you have to pass the arguments: --gurobi_home and --gurobi_license_file to each program. The former points to the absolute path of the installation of Gurobi and the latter to its license.
python kws_ds_convolution.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64
Change linux64 with your operating system version/type.
For each program, you can specify the usage of the GPU by passing an id number from the --cuda_device
argument. In the default configuration, all the experiments results are stored inside the /logs directory and printed to the screen.
For each program, you can specify the solver's maximum running time per iteration by passing the value in seconds to the
--solver_time_limit
argument. For the DS-CNN size L, the suggested time is at least 3 hours (10800
seconds).
All the individual subnetwork architectures can be trained in isolation by running the _full_training.py
files.
Train DS-CNN models
python kws_ds_convolution.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64
To train the REDS DS-CNN S models on CIFAR10 or Fashion-MNIST run for the former the vision_ds_convolution_fashion_mnist.py file and for the latter vision_ds_convolution_cifar10.py file. The pre-trained models are stored in the models/ folder.
Train DNN models
python kws_dnn.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64
Train CNN models
python kws_convolution_cnn.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64
The results obtained from the subnetworks configuration are obtained from the official Google Tensorflow Lite benchmarking tool. From left to right: number of model parameters, model accuracy and model inference as a function of MAC percentage in each REDS subnetwork.
(1) Models size S
(2) Models size L
REDS zero-overhead was assessed on Tensorflow Lite for Microcontrollers by implementing the runtime dynamic adaptation of the deployed model and by modifying the fully connected floating point kernel.
REDS's iterative knapsack for depth-wise convolutions is modelled with OR-Tools and its implementation can be found here.
If you found this repository useful, please consider citing our work.
@article{corti2023reds,
title={REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints},
author={Corti, Francesco and Maag, Balz and Schauer, Joachim and Pferschy, Ulrich and Saukh, Olga},
journal={arXiv preprint arXiv:2311.13349},
year={2023}
}