SquaredDifference values halved? #85

adampl · 2015-11-03T20:55:09Z

Hi everyone!

I'm using brainstorm for multivariable regression with LSTM network and I've just noticed that the SquaredDifference.outputs.default values are exactly half of the real squared errors.

FYI: using Python 3.3.3 with the latest version of brainstorm from master.

By the way, a regression example would be welcome :)

The text was updated successfully, but these errors were encountered:

flukeskywalker · 2015-11-03T21:45:36Z

Good point. You're right, the layer implementation actually computes half of the squared difference (and the gradients accordingly), so SquaredDifference is a misnomer. We should do something about this. (CC @Qwlouse)

As to the reason, this is basically because we wanted to use this layer for regression similar to how some libraries do it (e.g. Caffe's EuclideanLossLayer). This makes the results match for someone coming from another library. Libraries such as Chainer use the 'correct' error though, so may be we should switch to that.

We should have a regression example. Any suggestions for the task?

untom · 2015-11-03T22:07:31Z

This probably is the only way to make gradient checking work correctly, though. If we didn't add the constant 1/2, we'd have to multiply errors by 2 during backprop to make everything "mathematically correct" -- in practice we could of course just say that that constant gets absorbed into the learning rate, but it's technically correct to add the 1/2, IMO.

flukeskywalker · 2015-11-03T22:20:23Z

The backward pass implementation could simply multiply deltas by 2, so the gradient check would work fine.

Edit: I meant to say, sure we'll have to modify the backward pass, but this is not a reason to not compute the 'correct' squared difference.

adampl · 2015-11-04T11:10:22Z

Or just warn about this somewhere in the docs, though it seems less elegant.

We should have a regression example. Any suggestions for the task?

I suggest time series prediction using LSTM :)

flukeskywalker · 2015-11-04T13:22:18Z

Here's a plan for this issue. We'll change SquaredDifference layer so that it computes the correct squared difference. We'll add a new layer (let's call it SquaredLoss, subject to change) which will be used similar to SoftmaxCE layer, but for regression. Differences:

SquaredLoss will compute half of the squared error, to be consistent with implementations like UFLDL, Caffe etc. This will be specified in the docs.
SquaredLoss will have inputs named default and targets, SquaredDifference has inputs named inputs_1 and inputs_2.
SquaredLoss will have outputs named predictions and loss, SquaredDifference has only default.
SquaredLoss will only compute gradients w.r.t. default inputs, SquaredDifference computes them for both inputs.

flukeskywalker · 2015-11-04T15:50:12Z

I'm done with making the above change in a private branch. I've named the new layer SquaredLoss, but perhaps SquaredError or something else would be better? (Caffe calls it EuclideanLoss).

sjoerdvansteenkiste · 2015-11-04T22:39:46Z

How about EuclideanRE (compatible with SoftmaxCE)

Qwlouse · 2015-11-05T11:42:18Z

I don't really like Euclidean because it doesn't really mean anything in the Neural Networks community. But CE is a good suffix to stay consistent. Maybe GaussianCE or RegressionCE would be better?

flukeskywalker · 2015-11-05T13:53:32Z

I agree, Euclidean loss is not really a name commonly used in NN literature. MSE is. CE is a good suffix, but also not commonly used in regression context.
A proposition which tries to respect convention and accuracy: the new layer is called SquaredError or SE (since it doesn't compute the mean), the older layer is renamed to EuclideanDistance. We may later have a CosineDistance layer (for siamese nets etc.)

flukeskywalker · 2015-11-05T20:44:53Z

I now realize that EuclideanDistance would clearly not be a correct name either.

adampl · 2015-11-07T14:34:00Z

HalvedSquaredDifference? :)

flukeskywalker · 2015-11-07T17:59:46Z

:D Good point. However, I think that SquaredError is probably the best name for the new layer, even though it halves the error. The Error suffix can act as a signal about it being special (it does not compute gradients for targets like other CE layers). It would be easily recognizable, and we would note in the doc that it computes half of the squared error as defined in some textbooks.

I have already changed the older SquareDifference layer such that it computes the actual sq. diff.

I tried a bit to look for an LSTM regression dataset which would be as recognizable as MNIST/CIFAR but didn't really find one. Open to suggestions.

flukeskywalker added a commit that referenced this issue Nov 16, 2015

Finalized SquaredError layer. Addresses #85.

09b6fdc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SquaredDifference values halved? #85

SquaredDifference values halved? #85

adampl commented Nov 3, 2015

flukeskywalker commented Nov 3, 2015

untom commented Nov 3, 2015

flukeskywalker commented Nov 3, 2015

adampl commented Nov 4, 2015

flukeskywalker commented Nov 4, 2015

flukeskywalker commented Nov 4, 2015

sjoerdvansteenkiste commented Nov 4, 2015

Qwlouse commented Nov 5, 2015

flukeskywalker commented Nov 5, 2015

flukeskywalker commented Nov 5, 2015

adampl commented Nov 7, 2015

flukeskywalker commented Nov 7, 2015

SquaredDifference values halved? #85

SquaredDifference values halved? #85

Comments

adampl commented Nov 3, 2015

flukeskywalker commented Nov 3, 2015

untom commented Nov 3, 2015

flukeskywalker commented Nov 3, 2015

adampl commented Nov 4, 2015

flukeskywalker commented Nov 4, 2015

flukeskywalker commented Nov 4, 2015

sjoerdvansteenkiste commented Nov 4, 2015

Qwlouse commented Nov 5, 2015

flukeskywalker commented Nov 5, 2015

flukeskywalker commented Nov 5, 2015

adampl commented Nov 7, 2015

flukeskywalker commented Nov 7, 2015