Skip to content

Falcon Model Download

John edited this page Jul 7, 2023 · 2 revisions

Model file versions:

  • pytorch_model.bin HF models These python transformers models from Huggingface can not be directly used
    Use falcon_convert.py to convert them into Foundation version models (see below)
  • Foundation versions:
    These are the 16 and 32 bit primitive ggml V1 models created by the falcon_convert.py python script.
    These are not to be used directly by ggllm.cpp, only as input of falcon_quantize
    You need tokenizer.json in the same directory to quantizer them
    tokenizer.json is available on the Huggingface Files page of each model.
  • Version 4 GGML-GGJT-v3:
    Quantized from 2 to 8 bit or 16 bit
    These are legacy models created by the first versions of falcon_quantize.
    To run those models you need to put tokenizer.json into the same directory.
  • Version 10 GGCC Quantized from 2 to 8 bit or 16 bit
    This is the current optimized format for ggllm.cpp, tokenizer.json is not required anymore

You can either download the original HF models and first convert, then quantize them to the size of your choice.
Or you download premade quantized models, TheBloke (see below) offers a few variants.

The Bloke features fine tuned weights in ggcc v10 with various quantization options:
https://huggingface.co/TheBloke/falcon-40b-sft-mix-1226-GGML (OpenAssistant 40B)
https://huggingface.co/TheBloke/falcon-40b-instruct-GGML
https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
https://huggingface.co/TheBloke/falcon-7b-instruct-GGML
https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GGML

The official HF models are here:
https://huggingface.co/tiiuae/falcon-40b/
https://huggingface.co/tiiuae/falcon-7b/
https://huggingface.co/tiiuae/falcon-40b-instruct
https://huggingface.co/tiiuae/falcon-7b-instruct

OpenAssistant here: https://huggingface.co/OpenAssistant
https://huggingface.co/OpenAssistant/falcon-7b-sft-mix-2000
https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226
Download the 7B or 40B Falcon version, use falcon_convert.py (latest version) in 32 bit mode, then falcon_quantize to convert it to ggcc-v10