Skip to content

The Gotcha Collection 🤔

Stefan Kennedy edited this page Feb 23, 2019 · 1 revision

This page is a collection of things we want to refer to in case we forget about them. Some of it occurs more commonly, but since there's more than one person on the project it is useful to share knowledge this way sometimes.

Conv2D layers take input with 4 dimensions.

Conv2D layers are designed for images. In a '2D' image we have pixels along an X and Y axis. These are the two dimensions in the name '2D'. Our use of the Conv2D layer will be for input derived from text, and this reveals a gotcha. Although 2D images get their name from the X and Y axis, they also have a third dimension that represents RGB, or RGBA, etc. This dimension would then be 3, or 4, etc deep.

If we want to model review text we would naturally do it by stacking vectors on top of each other. Check figure one of this paper. Therefore we would naturally expect to use a Conv2D layer, however there is no third dimension to match RGB or RGBA, etc. As such we can set the channel dimension (the RGB/RGBA dimension) to have a size of 1. We could possibly use a Conv1D layer to eliminate the last dimension, although I didn't think of that when I created the earlier experiments.

The fourth dimension used by Conv2D has one feature set per slot in this dimension. For reviews it means one review per discrete position in this dimension. The side-effect of this is that we must make this dimension large enough to store all of our reviews, so we must pad all our reviews to equal the word-count of the longest review (by word count).

How do Neural Networks take embeddings as inputs to a node?

When we use an embedding layer it creates embeddings for us. The embedding layer has a matrix that maps a word to the words embedding. We then input a bag of words representation and the embedding layer does the conversion.

Clone this wiki locally