-
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello, Brevitas supports several strategies for scale factor computation/update. We do support a standalone learned scale factor both for activations and parameters, but that is not the only option. Leaving aside the trivial case of a constant scale factor, you can also have a learning strategy where for a certain number of steps, the scale factor is based on statistics, and in the background an EMA is updated based on said statistics. After this number of steps, the EMA value is used to inititialize a learned scale factor, which will be updated through backpropagation for the remainder of the training. Another option is simply to use a statistics-based approach, where the scale depends on some statics over the parameters or the activations. Weights and activations can use different strategies, and in general the user has a lot of fine grain control over all the hyper-parameters for all these strategies. |
Beta Was this translation helpful? Give feedback.
Yes that is basically one aspect of it.
The other option is to accumulate statistics during training time over your tensor, and compute the scale factor based on that.
At eval time, then the scale factor is fixed. This is for activations where you have runtime tensors.
For weights, you can simply have a scale factor which depends on some statistics of the weights.