-
Notifications
You must be signed in to change notification settings - Fork 0
Spatial Classification
If you have a dataset that does not have temporal sequences in it, you can tell nupic to create a "spatial classification" model for it. Here we are using the term "spatial" to mean that all of the information required to produce an output at time 't' is present at time 't' and no historical data is required.
As an example, let's say you wanted to create a model that, given attributes of an item in the grocery store, outputs the item name. You could construct the records for this dataset as follows:
packaging, height, weight, category
glass, 6, 16, "salad dressing"
cardboard, 12, 16, "cereal"
plastic, 12, 64, "milk"
plastic, 8, 16, "salad dressing"
...
Here, the category column is what the model should output given the data in all the other columns. In contrast to a temporal problem, the data in record N-1 (or N-2, N-3, etc.) has no bearing on what the model should output at record N.
See the examples/opf/experiments/spatial_classification
directory for some examples of OPF spatial classification experiments.
There are three important distinctions in the description.py
file that identify a spatial classification experiment from a temporal prediction experiment (see examples/opf/experiments/spatial_classification/base/description.py
for reference):
- The
config['modelParams']['inferenceType']
will be 'NontemporalClassification' (in a temporal prediction experiment it will typically be 'TemporalMultistep'). - The
config['modelparams']['clParams']['steps']
will be '0'. - The encoder that encodes the predicted field has 'classifierOnly' set to True.
The inferenceType setting tells the OPF what algorithmic components to put into the model. In the current implementation, 'NontemporalClassification' tells the OPF to build a model that contains only encoders and a classifier (no spatial or temporal pooler).
The config['modelparams']['clParams']['steps']
setting tells the classifier to associate the current input with the current value of the predicted field.
The 'classifierOnly' flag on the predicted field's encoder tells the OPF to not feed this field into the bottom of the network. It will only be fed in to the classifier as the classification input. NOTE: The name of the predicted field is defined in the control['inferenceArgs']['predictedField']
entry of description.py.
To run the 'category_1' example in the OPF, execute the following commands:
cd examples/opf/experiments/spatial_classification
python $NTA/share/opf/bin/OpfRunExperiment.py category_1
This will produce an output .csv file in category_1/inference that contains the input and output of the model at each time step. This particular experiment gets its input from the datasets/category_1.csv file.
In the category_1 example, every row of the input file contains a value for the 'classification' input (the 'classification' input is the predicted field in this particular experiment). This allows the model to update its learning on every record. You may instead though want to ask the model to only perform inference. Simply putting in 'None' for the value of the predicted field in any row will send the row to the model and get the output without updating the learning.
For example, the last few rows of the category_1.csv file contain:
12,12
0,0
41,41
36,36
22,22
If you replace any of the predicted field values with None, the OPF will perform only inference on those specific records:(TODO: reword, only on the 3rd-5th row? Or only try predict value for the None fields?)
12,12
0,0
None,41
None,36
None,22
As mentioned in the introduction, a spatial classification model does not need historical inputs in order to correctly classify the current input. An important distinction however is that a Nupic model is generally always online learning (unless you have disabled learning for a specific record by putting in None for the predicted value). So, even though the prior history is not strictly used to classify the current input, the prior history still impacts the learned state of the model after each record.
The current spatial classification support in the OPF is a very simple, early implementation and not much effort has been put into it yet as far as optimizing its performance across various datasets. The principle goal was to get the basic plumbing in place. Future work will focus on improving it's performance, perhaps by using different network topologies and/or different classifiers.
- TODO: What classifier algorithm(s) are used?
- TODO: Does this Spatial classification predict only "output label" fields, or can it predict also "expected input bits"? As in reconstruction, eg. for noisy inputs.
- TODO: From above: "So, even though the prior history is not strictly used to classify the current input, the prior history still impacts the learned state of the model after each record. " So, how do I pass a sample for classification, without anyhow modifying the state of the network? I can't set all the fields to None. ...