-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accumulate gradient with the new gradient API? #707
Comments
Just sum the losses you're interested in; the gradient for the sum will be the same as the sum of the gradients. |
Wouldn't that requires more memories? |
Yes, true. The other option is just to define |
@chengchingwen I added this to the community triage for discussion. Do you have any more comments on this? I'm inclined to say that this isn't something that needs an explicit function in Flux, but I am curious to hear if there is a use-case where this would be really helpful. Note that you can write this quite elegantly with map(params(m), grads) do p, g
update!(opt, p, g)
end or something. |
For generic optimizers, what you suggest is not equivalent to first accumulating gradients then performing a single update. |
This is useful when one wants to perform updates with large batchsize but is memory constrained |
Also useful for data parallellism |
Oh I totally missed the accumulation part. Yeah perhaps we can open an issue on Zygote to track gradient algebra in general? |
During triage, people were fairly enthusiastic about supporting something like this. There were not too many concerns about adding functionality to The implementation approach that we discussed was to create gs1 .+ gs2 # becomes map(+, gs1, gs2)
gs1 .+= gs2 # becomes map!(+, gs1, gs1, gs2) |
Since this is the first hit when searching the internet for how to do gradient accumulation in Flux.jl, maybe it's helpful to add here that this is now implemented in (Just realized this after spending quite some time writing it myself, which can be annoying if you already have a complicated training loop.) |
I want to accumulate the gradient value of multiple data before I do back-prop. In the old API, I can just run multiple times of
back!(loss)
then call theudpate!(opt, params(model))
. However, with the newgradient
API, I have to collect all theGrad(...)
beforehand and then call multiple times ofupdate!(opt, params(model)), grad)
. Are there any better ways to do this?The text was updated successfully, but these errors were encountered: