inspired by mlabonne's autoquant (check out the thread to find out more about each type of quant), does not contaminate cross-cell allowing you to quant multiple in one go without restarting the kernel, allows for exl-requanting once measurement is done, shows progress during the model download, auto uploads your quant and much more, all in one portable jupyter notebook that can be dropped into a runpod for easy use, no colab required
Note
make sure to fill out your huggingface USERNAME
and HF_TOKEN
(you can create one in your settings) otherwise uploading your quant won't work.
Note
all quants are uploaded as private, so you can double check before publishing it to your profile
supports:
- exl2 (with fast measurement requant)
- awq
- hqq
- gptq
- gguf