-
Notifications
You must be signed in to change notification settings - Fork 21
Falcon Model Download
Model file versions:
-
pytorch_model.bin HF models
These python transformers models from Huggingface can not be directly used
Use falcon_convert.py to convert them into Foundation version models (see below) -
Foundation versions:
These are the 16 and 32 bit primitive ggml V1 models created by the falcon_convert.py python script.
These are not to be used directly by ggllm.cpp, only as input of falcon_quantize
You need tokenizer.json in the same directory to quantizer them
tokenizer.json is available on the Huggingface Files page of each model. -
Version 4 GGML-GGJT-v3:
Quantized from 2 to 8 bit or 16 bit
These are legacy models created by the first versions of falcon_quantize.
To run those models you need to put tokenizer.json into the same directory. -
Version 10 GGCC
Quantized from 2 to 8 bit or 16 bit
This is the current optimized format for ggllm.cpp, tokenizer.json is not required anymore
You can either download the original HF models and first convert, then quantize them to the size of your choice.
Or you download premade quantized models, TheBloke (see below) offers a few variants.
The Bloke features fine tuned weights in ggcc v10 with various quantization options:
https://huggingface.co/TheBloke/falcon-40b-sft-mix-1226-GGML (OpenAssistant 40B)
https://huggingface.co/TheBloke/falcon-40b-instruct-GGML
https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
https://huggingface.co/TheBloke/falcon-7b-instruct-GGML
https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GGML
The official HF models are here:
https://huggingface.co/tiiuae/falcon-40b/
https://huggingface.co/tiiuae/falcon-7b/
https://huggingface.co/tiiuae/falcon-40b-instruct
https://huggingface.co/tiiuae/falcon-7b-instruct
OpenAssistant here:
https://huggingface.co/OpenAssistant
https://huggingface.co/OpenAssistant/falcon-7b-sft-mix-2000
https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226
Download the 7B or 40B Falcon version, use falcon_convert.py (latest version) in 32 bit mode, then falcon_quantize to convert it to ggcc-v10