Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about gradient calculation in the quantizer #27

Open
Chao-Lu opened this issue Jan 24, 2019 · 5 comments
Open

Question about gradient calculation in the quantizer #27

Chao-Lu opened this issue Jan 24, 2019 · 5 comments

Comments

@Chao-Lu
Copy link

Chao-Lu commented Jan 24, 2019

  1. I found that you have used "tf.stop_gradient()" to deal with the nondifferentiable properties of "tf.argmin()", "tf.round()". However, "tf.stop_gradient()" is used to ignore the gradient contirbution of present node, which means that your encoder network will not update its parameters since all the nodes before the quantizer (specifically the encoder network) will be ignored in the gradient calculation.

  2. Are you tring to make the quantizer have a fixed gradient (such as "1") value at any time? If you are, I think you have to re-define the gradient of the quantizer rather than use "tf.stop_gradient()" .

@Chao-Lu Chao-Lu closed this as completed Jan 25, 2019
@Justin-Tan Justin-Tan reopened this Jan 25, 2019
@Justin-Tan
Copy link
Owner

I will look into the the first one, can you give more details about the second point?

@sun107
Copy link

sun107 commented May 25, 2021

I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much!

@Justin-Tan
Copy link
Owner

Justin-Tan commented May 25, 2021 via email

@sun107
Copy link

sun107 commented May 28, 2021

Really thanks for your answer. I've got another problem. I wonder how does batchsize affect the result. Have you ever tried a bigger batch size? Thanks for your reply again!

@Justin-Tan
Copy link
Owner

The batch size is limited by GPU memory. In recent papers about learned image compression the batch size is usually set to something low like 8 or 16, so I imagine that other hyperparameters would be more important to tune.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants