Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't you accumulate the validation gradients too while training? #57

Open
burak43 opened this issue Feb 24, 2020 · 2 comments
Open

Don't you accumulate the validation gradients too while training? #57

burak43 opened this issue Feb 24, 2020 · 2 comments

Comments

@burak43
Copy link

burak43 commented Feb 24, 2020

in train_i3d.py file, you do loss.backward() for both train and val phases. Doesn't it accumulate gradients for the validation loss too no matter you put the model in eval mode (since it only affects the behaviour of some layers such as dropout, batch norm)? Is there pytorch 0.3.0 specific thing that blocks validation gradient accumulation?

@piergiaj
Copy link
Owner

These lines:
https://github.com/piergiaj/pytorch-i3d/blob/master/train_i3d.py#L115-L119

Only apply the gradient step when in training model. Combined with https://github.com/piergiaj/pytorch-i3d/blob/master/train_i3d.py#L86
the gradients from the validation step are never applied.

For efficiency, the loss.backward() could be removed from the validation step, but since they are never applied, it will not impact model accuracy.

@burak43
Copy link
Author

burak43 commented Feb 25, 2020

These lines:
https://github.com/piergiaj/pytorch-i3d/blob/master/train_i3d.py#L115-L119

Only apply the gradient step when in training model. Combined with https://github.com/piergiaj/pytorch-i3d/blob/master/train_i3d.py#L86
the gradients from the validation step are never applied.

For efficiency, the loss.backward() could be removed from the validation step, but since they are never applied, it will not impact model accuracy.

I see. Then, as I said in #44 (comment), when num_steps_per_update is not a multiple of len(dataloader), the leftover accumulated training gradiens are zeroed before calling optimizer.step() when phase change from training to validation. As a result, leftover forward training pass losses are not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants