Consider uploading some quantized checkpoints to hugginface #35

Calandiel · 2023-04-21T14:19:55Z

Correct me if I'm wrong but quantizing would require loading the models in their unquantized form (as per torch.load in https://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/convert_pytorch_to_ggml.py, line 126). Not to mention how much heavier the unquantized models are on bandwidths.

The text was updated successfully, but these errors were encountered:

saharNooby · 2023-04-21T15:29:28Z

Only PyTorch -> rwkv.cpp conversion would require to load the whole model in the RAM; quantization is done tensor-by-tensor. You are right about the bandwidth tho.

I'll consider it, thanks for the suggestion!

LoganDark · 2023-05-19T23:41:27Z

I have uploaded some quantized RWKV-4-Raven models to HuggingFace at LoganDark/rwkv-4-raven-ggml. Conversion took about 2 hours, and upload took about 24 hours and 500GB of disk space.

At the time of writing, the available models are:

Name	`f32`	`f16`	`Q4_0`	`Q4_1`	`Q4_2`	`Q5_1`	`Q8_0`
`RWKV-4-Raven-1B5-v11-Eng99-20230425-ctx4096`	Yes	Yes	Yes	No	Yes	Yes	Yes
`RWKV-4-Raven-3B-v11-Eng99-20230425-ctx4096`	Yes	Yes	Yes	No	Yes	Yes	Yes
`RWKV-4-Raven-7B-v11x-Eng99-20230429-ctx8192`	Yes	Yes	Yes	No	Yes	Yes	Yes
`RWKV-4-Raven-14B-v11x-Eng99-20230501-ctx8192`	Split	Yes	Yes	No	Yes	Yes	Yes

Feel free to create a discussion if you have a request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider uploading some quantized checkpoints to hugginface #35

Consider uploading some quantized checkpoints to hugginface #35

Calandiel commented Apr 21, 2023

saharNooby commented Apr 21, 2023

LoganDark commented May 19, 2023

Consider uploading some quantized checkpoints to hugginface #35

Consider uploading some quantized checkpoints to hugginface #35

Comments

Calandiel commented Apr 21, 2023

saharNooby commented Apr 21, 2023

LoganDark commented May 19, 2023