You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things change. If I should be adding more columns or rows or if there's any cells you disagree with please let me know!
The columns can also compose with each other but to be explicit
training with FSDP2 should compose with low bit optimizers
Inference quantization and KV cache quantization should compose
And sparsity IIUC only works with in8 inference quantization right now
Dtype
Training with FSDP2
Inference
Optimizer
QAT
KV cache
Notes
Int8
Experimental
Yes
Yes
LUT based
Yes
Int4
No
Yes
Yes
LUT based
No
Fp8
Yes
Yes
Yes
Not needed
No
NF4
Yes
Experimental
No
In progress
No
Does not use quantize api
fp6
No
Yes
No
No
No
UintX/Fpx
In progress
Yes
No
No
No
Still requires more performance work
MX: fp8/6/4 with scales
Emulation only
Emulation only
No
Not needed because we can compute in this dtype
No
Pending release of B100 gpus for acceleration
Autoquant
N/A
Yes
N/A
N/A
N/A
Supports int8/4. Fp8 coming next
TODO
Seperate table where columns are weights, activation, optimizer and gradients
Seperate table where techniques are rows and columns are devices
The text was updated successfully, but these errors were encountered:
msaroufim
changed the title
AO dtype composability status
AO dtype composability tracker
Sep 8, 2024
Small correction. 8-bit and 4-bit optimizers are not exactly INT8 and INT4. They are LUT-based quantization, where the LUT values are defined by Timm Dettmer's "dynamic tree quantization" scheme. (to be even more specific, the 2nd buffer of INT4 optimizer actually uses affine quantization).
As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things change. If I should be adding more columns or rows or if there's any cells you disagree with please let me know!
The columns can also compose with each other but to be explicit
And sparsity IIUC only works with in8 inference quantization right now
TODO
The text was updated successfully, but these errors were encountered: