How is scaling factor updated during training? #803

hleblevec · 2024-01-22T15:22:32Z

hleblevec
Jan 22, 2024

Hi,
As I've been digging into QAT literature lately, I wanted to understand how scaling factor is effectively trained in Brevitas. How is the scaling factor gradient obtained from the training loss? I saw multiple mentions of LSQ in the literature as being one the most used method, is it what's implemented in Brevitas ? I would like to be able to see how Brevitas' way of implementing quantization compares with methods that can be found in the litterature.
Hope you can help!

Answered by Giuseppe5

Jan 25, 2024

Yes that is basically one aspect of it.
The other option is to accumulate statistics during training time over your tensor, and compute the scale factor based on that.
At eval time, then the scale factor is fixed. This is for activations where you have runtime tensors.

For weights, you can simply have a scale factor which depends on some statistics of the weights.

View full answer

Giuseppe5 · 2024-01-25T09:20:37Z

Giuseppe5
Jan 25, 2024
Maintainer

Hello,

Brevitas supports several strategies for scale factor computation/update.

We do support a standalone learned scale factor both for activations and parameters, but that is not the only option.

Leaving aside the trivial case of a constant scale factor, you can also have a learning strategy where for a certain number of steps, the scale factor is based on statistics, and in the background an EMA is updated based on said statistics. After this number of steps, the EMA value is used to inititialize a learned scale factor, which will be updated through backpropagation for the remainder of the training.

Another option is simply to use a statistics-based approach, where the scale depends on some statics over the parameters or the activations.

Weights and activations can use different strategies, and in general the user has a lot of fine grain control over all the hyper-parameters for all these strategies.

3 replies

hleblevec Jan 25, 2024
Author

Hello,
Thanks for your reply. So if I understand correctly, the scaling factor is a torch.nn.Parameter that you initialize differently depending on the chosen strategy, and then you let pytorch compute gradients and update it. Is that correct?

Giuseppe5 Jan 25, 2024
Maintainer

Yes that is basically one aspect of it.
The other option is to accumulate statistics during training time over your tensor, and compute the scale factor based on that.
At eval time, then the scale factor is fixed. This is for activations where you have runtime tensors.

For weights, you can simply have a scale factor which depends on some statistics of the weights.

Answer selected by hleblevec

hleblevec Jan 29, 2024
Author

Ok I see, thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is scaling factor updated during training? #803

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How is scaling factor updated during training? #803

hleblevec Jan 22, 2024

Replies: 1 comment · 3 replies

Giuseppe5 Jan 25, 2024 Maintainer

hleblevec Jan 25, 2024 Author

Giuseppe5 Jan 25, 2024 Maintainer

hleblevec Jan 29, 2024 Author

hleblevec
Jan 22, 2024

Replies: 1 comment 3 replies

Giuseppe5
Jan 25, 2024
Maintainer

hleblevec Jan 25, 2024
Author

Giuseppe5 Jan 25, 2024
Maintainer

hleblevec Jan 29, 2024
Author