-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about gradient calculation in the quantizer #27
Comments
I will look into the the first one, can you give more details about the second point? |
I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much! |
IIRC, I manually overrode the gradient so that it is just the identity -
this is the 'straight-through' estimator that works surprisingly well in
practice.
…On Tue, May 25, 2021 at 12:29 PM sun107 ***@***.***> wrote:
I wonder how to deal with the gradient when I apply the operations such as
"tf.round". Will setting the gradient of these operation to be 1 help?
Could you offer some references? Thank you very much!
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#27 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGRNY6GLUT5LV2GC6A74KQTTPMDSBANCNFSM4GSASWGA>
.
|
Really thanks for your answer. I've got another problem. I wonder how does batchsize affect the result. Have you ever tried a bigger batch size? Thanks for your reply again! |
The batch size is limited by GPU memory. In recent papers about learned image compression the batch size is usually set to something low like 8 or 16, so I imagine that other hyperparameters would be more important to tune. |
I found that you have used "tf.stop_gradient()" to deal with the nondifferentiable properties of "tf.argmin()", "tf.round()". However, "tf.stop_gradient()" is used to ignore the gradient contirbution of present node, which means that your encoder network will not update its parameters since all the nodes before the quantizer (specifically the encoder network) will be ignored in the gradient calculation.
Are you tring to make the quantizer have a fixed gradient (such as "1") value at any time? If you are, I think you have to re-define the gradient of the quantizer rather than use "tf.stop_gradient()" .
The text was updated successfully, but these errors were encountered: