Question about gradient calculation in the quantizer #27

Chao-Lu · 2019-01-24T05:02:43Z

I found that you have used "tf.stop_gradient()" to deal with the nondifferentiable properties of "tf.argmin()", "tf.round()". However, "tf.stop_gradient()" is used to ignore the gradient contirbution of present node, which means that your encoder network will not update its parameters since all the nodes before the quantizer (specifically the encoder network) will be ignored in the gradient calculation.
Are you tring to make the quantizer have a fixed gradient (such as "1") value at any time? If you are, I think you have to re-define the gradient of the quantizer rather than use "tf.stop_gradient()" .

Justin-Tan · 2019-01-25T03:45:34Z

I will look into the the first one, can you give more details about the second point?

sun107 · 2021-05-25T02:29:40Z

I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much!

Justin-Tan · 2021-05-25T03:19:24Z

IIRC, I manually overrode the gradient so that it is just the identity - this is the 'straight-through' estimator that works surprisingly well in practice.

…

On Tue, May 25, 2021 at 12:29 PM sun107 ***@***.***> wrote: I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much! — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGRNY6GLUT5LV2GC6A74KQTTPMDSBANCNFSM4GSASWGA> .

sun107 · 2021-05-28T07:54:16Z

Really thanks for your answer. I've got another problem. I wonder how does batchsize affect the result. Have you ever tried a bigger batch size? Thanks for your reply again!

Justin-Tan · 2021-05-28T10:46:43Z

The batch size is limited by GPU memory. In recent papers about learned image compression the batch size is usually set to something low like 8 or 16, so I imagine that other hyperparameters would be more important to tune.

Chao-Lu closed this as completed Jan 25, 2019

Justin-Tan reopened this Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about gradient calculation in the quantizer #27

Question about gradient calculation in the quantizer #27

Chao-Lu commented Jan 24, 2019

Justin-Tan commented Jan 25, 2019

sun107 commented May 25, 2021

Justin-Tan commented May 25, 2021 via email

sun107 commented May 28, 2021

Justin-Tan commented May 28, 2021

Question about gradient calculation in the quantizer #27

Question about gradient calculation in the quantizer #27

Comments

Chao-Lu commented Jan 24, 2019

Justin-Tan commented Jan 25, 2019

sun107 commented May 25, 2021

Justin-Tan commented May 25, 2021 via email

sun107 commented May 28, 2021

Justin-Tan commented May 28, 2021