You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see that the sample codes all talk about Attention block or MLP block. Can aqt int8 only be used for parts involving parameter calculation? For example, qk score calculation, score * V calculation, can these be used aqt int8?
The text was updated successfully, but these errors were encountered:
Yeah it can. All Einsums/DotGeneral can be quantized.
For more advanced cases (cache), one has to use QTensor and Quantizer.quant directly. We don't have an example in docs or mini-model for that at the moment.
I see that the sample codes all talk about Attention block or MLP block. Can aqt int8 only be used for parts involving parameter calculation? For example, qk score calculation, score * V calculation, can these be used aqt int8?
The text was updated successfully, but these errors were encountered: