At Numenta continual learning is mostly believed to happen in the brain thanks to sparsity and dynamically growing synaptic connections. Sparsity of activations and connections allows to condense in a reasonable low dimension (e.g. 10k bits) an enormously large quantity of non-overlapping distributed patterns.
This means that once you want to learn a new pattern you just need to grow new synapses to encode that knowledge and thanks to sparsity, they will rarely interfere with one another. This idea of learning by simply encoding knowledge in different sparse weights is quite powerful for continual learning since it removes the problems of interference among weights. In standard deep nets, the contributions of the weights is much more distributed and difficult to disentangle.
This is due to full connectivity, and the very nature of gradient descent optimization.The idea of this project would be to work on highly sparse deep nets (2-10% connectivity) and slowly grow connections maintaining sparsity in the activations and eventually preserving old weights as much as possible (i.e. fixed or slow learning rate?) but still using backprop.
Possibly related, interesting papers:
- “Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights”
- “Compete to Compute”
- “Superposition of many models into one”
- “The power of Sparsity in CNNs”
- "PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning"
- “Selfless Sequential Learning”
In this codebase you will find a just a few exploratory experiments, trying to
apply sparsity in continual learning. In particular, sparsity of both
the units and the weightsis enforced through the use of the Kwinners
and
SparseWeights
implementations offered in nupic.torch
.
At the moment, this codebase supports:
- 3 benchmarks:
Permuted MNIST
,SPlit MNIST
andICifar10
. - 2 architectures: Plain
MLPs
andCNNs
with parametrized structure.
The main idea is to apply sparsity in these settings and see if we can have a better average accuracy across tasks at the end of the continual learning process. Results up to now are promising, especially with MLPs where the difference in accuracy can exceed 10% in some cases. However, more work seems to be done to scale these results to ConvNets.
Here we list the directory structure of the project:
benchmarks
: It contains all the data loaders and utility scripts for handling the 3 benchmarks provided.exps
: It contains all the experiments config files.models
: It contains the neural networks architectures considered.results
: It's a void directory that will contain the results of the exps in the pkl format.utils
: It contains all the utility scripts for the experiments, mostly building on top of numpy and pytorch.
When using anaconda virtual environment all you need to do is run the following command and conda will install everything for you. See environment.yml:
conda env create --file environment.yml
conda activate sparse_syn
pip install -r requirements.txt
and than run the default experiment:
python run_exps.py
Or a specific experiment with its name configuration (all the exps names are
listed in the exps/exps_params.cfg
file.):
python run_exps.py --name <exp_name>
For each experiment the following parameters has been considered:
benchmark
: (str) Continual learning benchmark used for the experiment ("cifar"
or"mnist"
).mnist_mode
: (str) In case the"mnist"
benchmark is used it can be either"perm"
or"split"
.num_batch
: (int) Number of training batches/tasks to generate (for cifar or split mnist this number should be fixed to 10 and 5 respectively).cumul
: (bool)True
if we want to run the cumulative baseline (training on the union of all the batches training sets.)sparsify
: (bool)True
if we want to introduce theKwinners
andSparseWeights
layers after every fully connected layer or conv.percent_on_fc
: (float) Percentage of active units after a fully connected layer.percent_on_conv
: (float) Percentage of active units after a conv layer.k_inference_factor
: (float) Boosting parameter forKwinners
.boost_strength
: (float) Boosting parameter forKwinners
(0
to shut it off completely).boost_strength_factor
: (float) Boosting parameter forKwinners
.duty_cycle_period
: (int) Boosting parameter forKwinners
.weight_sparsity_fc
: (float) Weights sparsity percentage for a fully connected layer.weight_sparsity_conv
: (float) Weights sparsity percentage for conv layer.cnn
: (bool)True
if the architecture is a CNN, otherwise MLP.hidden_units
: (int) Number of units in each hidden layer.hidden_layers
: (int) Number of hidden layers.dropout
: (int) Dropout percentage.lr
: (float) Learning rate.nesterov
: (bool) Nesterov optimizer.momentum
: (float) Momentum.weight_decay
: (float) Weight Decaymb_size
: (int) Mini-Batch size.train_ep
: (int) Training epochs for the first task.train_ep_inc
: (int) Training epoch for the following tasks.record_stats
: (bool)True
to record stats about sparsity.