add Falcon 40B model support #368

skirodev · 2023-07-15T10:50:49Z

This PR has already passed the tests of the Falcon mini series models, but due to limitations of my device, I haven't tested it with the original Falcon series models. #293

philpax · 2023-07-15T10:55:19Z

Looks promising, I'll leave it to Lukas to check the specifics.

Is the model definition/file format beginning to stabilise a little? Part of the reason it's still experimental is because, as far as I can tell, upstream is still figuring out the specifics of the implementation.

skirodev · 2023-07-15T11:14:34Z

@philpax Yeah, perhaps it should remain in an experimental state until it undergoes all necessary tests and stabilizes.

LLukas22 · 2023-07-15T14:59:12Z

@skirodev On a first glance this looks good. How did you create your ggmlv3 versions of the model? I would like to try 7B- and 40B-instruct before i merge this.

skirodev · 2023-07-15T15:39:56Z

@LLukas22 I just uploaded the conversion code (only for f16 and f32) to Hugging Face. The quantization code is not completed yet.

LLukas22 · 2023-07-15T15:43:24Z

Thanks! You an probably just use llm to quantize your converted models, it should support the normal Q-quants if you implemented the hyperparams correctly.

skirodev · 2023-07-15T16:00:47Z

Thanks for the suggestion! I'll try using llm to quantize the converted models and see how it works.

skirodev · 2023-07-16T06:10:18Z

The performance of using the falcon-7b-instruct-ggml-q4_0.bin model is shown in the following:

cargo run --release -- infer --no-float16 -n 256 -a falcon -m "./models/falcon-7b-instruct-ggmlv3-q4_0.bin" --batch-size 512 -p "write a story about falcon" --stats

LLukas22 · 2023-07-16T08:19:54Z

Alright, I used your script to create ggml versions of falcon-7B and falcon-40B and quantized them with llm to q4_0.

7B works as expected. Output:

write a story about falconry
About
Portfolio
Contacts
For all that is left in the forest, hunt down foxes, wolves, rabbits, and even birds of prey. The first is the hunt, the second is the capture of the bird, and the third is the training of the hawk. Falconers also breed hawks, and some birds may pass through more than one set of hands before they are considered ready for sale or use as a pet. However, you need to be an experienced bird owner before you start a hunt. There are three main types of hunting hawks: lure, live lure, and lure/live. Falconers are trained in many different types of birds, each with its own set of specialized handling techniques and training methods. Falconers hunt with two birds - one as a hawk and the other as a lure bird. For example, falconers use a lure hawk to hunt with live birds and lure hawks to hunt with lure birds. The lure hawk uses birds that are already wild and free to hunt for its own food, whereas the live hawk hunts live birds that are kept in captivity, either for the purpose of training or for hunting purposes. The birds that hunt with lure birds hunt birds like pigeons, doves, and prairie

Sadly 40B produces gibberish 😞. Output:

write a story about falcon/ it or  and  I your body, on the all of 
 not of the  it  new/  can a all you  the so  so. more- this. our past- so in: so I to so you  we'. in  we a  you  it a no.’ your people should a I of it,  by and, and  all the I  to, of

 that what is not all a. for  I can new
 for  in an and we you from all:  you may in of  all
/ to  - D

I don't know if the inference code is the problem or if the conversion-script/quantization corrupted some tensors.

crates/ggml/src/context.rs

crates/llm/Cargo.toml

crates/models/falcon/src/lib.rs

skirodev · 2023-07-16T09:45:50Z

Sadly 40B produces gibberish 😞. Output:

write a story about falcon/ it or  and  I your body, on the all of 
 not of the  it  new/  can a all you  the so  so. more- this. our past- so in: so I to so you  we'. in  we a  you  it a no.’ your people > should a I of it,  by and, and  all the I  to, of

 that what is not all a. for  I can new
 for  in an and we you from all:  you may in of  all
/ to  - D

I don't know if the inference code is the problem or if the conversion-script/quantization corrupted some tensors.

My apologies, could you retest it now?

LLukas22 · 2023-07-16T10:56:47Z

     Running `target\release\llm.exe infer --no-float16 -n 256 -a falcon -m C:\Users\lkreu\Desktop\falcon\falcon-40b-q4_0 --batch-size 512 -p "write a story about falcon" --stats -r tiiuae/falcon-40b`
⣟ Loading model...[2023-07-16T10:54:11Z INFO  cached_path::cache] Cached version of https://huggingface.co/tiiuae/falcon-40b/resolve/main/tokenizer.json is up-to-date
✓ Loaded 484 tensors (23.5 GB) after 750ms
write a story about falcon, our you this the . more to  not and
 on your so the- you.  I. it. we may a so it for that or any
 the and, and for more the and  we  I by no  so in and just   of our- a can of. all,? I it  at the 
 so,  is it of my personal to that. 
 the and by is it

It got a bit better but there probably still is something wrong. Maybe we should wait for the ggml update and revisit it then?

skirodev · 2023-07-16T11:09:34Z

It got a bit better but there probably still is something wrong. Maybe we should wait for the ggml update and revisit it then?

Absolutely. I agree with your suggestion. Let's await the ggml update and take another look at it when the time comes. Thank you for conducting the testing.

LLukas22 · 2023-07-16T16:07:28Z

Could you try to pull the latest main into this? It should contain the latest ggml version.

skirodev · 2023-07-17T04:08:30Z

Could you try to pull the latest main into this? It should contain the latest ggml version.

Sure, I already pulled the latest main branch for the updated ggml version.

LLukas22 · 2023-07-18T08:03:38Z

Falcon 40B still produces gibberish:

PS F:\Github\llm-main> cargo run --release --features falcon -- infer -a falcon -p "Tell me a story about a falcon" -m "C:\Users\lkreu\Desktop\falcon\falcon-40b-q4_0" --no-float16
    Finished release [optimized] target(s) in 0.24s
     Running `target\release\llm.exe infer -a falcon -p "Tell me a story about a falcon" -m C:\Users\lkreu\Desktop\falcon\falcon-40b-q4_0 --no-float16`
✓ Loaded 484 tensors (23.5 GB) after 114ms
Tell me a story about a falcon . more to or- they in, you your. de/ /   to  they.: 
  you, I so good more on to one. on a for all  I  so,.  i  or  and - and it  that  :: and la

Maybe i need to reconvert/quantize my model 🤔

skirodev · 2023-07-26T16:18:28Z

Falcon 40B is now capable of successful inference after my testing.

cargo run --release -- infer -a falcon -m "./models/falcon-40b-instruct-ggmlv3-q4_0.bin" --batch-size 512 -p "write a story about falcon" -r tiiuae/falcon-40b --stats

The majestic bird of prey soared through the sky, its wingspan stretching outwards as it searched for prey. Its sharp 
eyes scanned the horizon, and in an instant, it spotted movement below. With powerful strokes of its wings, it dove 
towards its target at incredible speeds before striking with lightning-fast precision. The falcon was a symbol of 
strength, agility, and intelligence – an awe-inspiring creature that commanded respect from all who saw it soar above.

However, the embedded tokenizer code still needs modification as Falcon does not require adding bos token id and has some special tokens, which may depend on the implementation of GGUF format.

LLukas22 · 2023-07-26T18:49:06Z

Good Job 👍

I'll give this another look tomorrow and if everything works i'm gonna merge it.

philpax

Code looks good! Will leave it to Lukas to do final tests but OK from my end. Hopefully we can get the additional information from GGUF soon.

LLukas22 · 2023-07-27T09:40:24Z

LGTM 👍

Now even works with the fp16 memory :D

ghost · 2023-08-02T01:17:45Z

Does it works with Metal?

LLukas22 · 2023-08-02T09:22:18Z

@jempabroni Maybe, depends on if all necessary operations were already ported into metal shaders. You can try using it and if it gives you an invalid operation error it's not supported yet.

skirodev added 2 commits July 15, 2023 18:34

add Falcon 40B model support

dadb104

disable falcon by default

e28e0ef

philpax requested a review from LLukas22 July 15, 2023 10:54

skirodev added 2 commits July 15, 2023 19:29

fix formatting

5b757aa

remove needless borrow

334e8be

fix bot token id

5c40a19

LLukas22 reviewed Jul 16, 2023

View reviewed changes

crates/ggml/src/context.rs Outdated Show resolved Hide resolved

crates/llm/Cargo.toml Outdated Show resolved Hide resolved

crates/models/falcon/src/lib.rs Show resolved Hide resolved

skirodev added 2 commits July 16, 2023 17:17

disable falcon by default

71c3273

fix attention_norm weight error

565ca6d

skirodev and others added 2 commits July 17, 2023 09:21

Merge branch 'rustformers:main' into feat/falcon

9ae6e56

fix bigv

a804a5f

fix formatting

191ec02

philpax requested a review from LLukas22 July 17, 2023 10:10

Merge branch 'rustformers:main' into feat/falcon

bce0b9a

skirodev and others added 2 commits July 26, 2023 13:52

Merge branch 'rustformers:main' into feat/falcon

871a5d8

remove bos token id and use float16 kv memory type

0afd18e

philpax approved these changes Jul 27, 2023

View reviewed changes

LLukas22 approved these changes Jul 27, 2023

View reviewed changes

LLukas22 merged commit 2259555 into rustformers:main Jul 27, 2023
14 checks passed

hhamud mentioned this pull request Aug 7, 2023

Write a 0.2 changelog #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Falcon 40B model support #368

add Falcon 40B model support #368

skirodev commented Jul 15, 2023

philpax commented Jul 15, 2023

skirodev commented Jul 15, 2023 •

edited

Loading

LLukas22 commented Jul 15, 2023

skirodev commented Jul 15, 2023

LLukas22 commented Jul 15, 2023

skirodev commented Jul 15, 2023

skirodev commented Jul 16, 2023

LLukas22 commented Jul 16, 2023

skirodev commented Jul 16, 2023

LLukas22 commented Jul 16, 2023

skirodev commented Jul 16, 2023

LLukas22 commented Jul 16, 2023

skirodev commented Jul 17, 2023 •

edited

Loading

LLukas22 commented Jul 18, 2023

skirodev commented Jul 26, 2023

LLukas22 commented Jul 26, 2023

philpax left a comment

LLukas22 commented Jul 27, 2023

ghost commented Aug 2, 2023

LLukas22 commented Aug 2, 2023

add Falcon 40B model support #368

add Falcon 40B model support #368

Conversation

skirodev commented Jul 15, 2023

philpax commented Jul 15, 2023

skirodev commented Jul 15, 2023 • edited Loading

LLukas22 commented Jul 15, 2023

skirodev commented Jul 15, 2023

LLukas22 commented Jul 15, 2023

skirodev commented Jul 15, 2023

skirodev commented Jul 16, 2023

LLukas22 commented Jul 16, 2023

skirodev commented Jul 16, 2023

LLukas22 commented Jul 16, 2023

skirodev commented Jul 16, 2023

LLukas22 commented Jul 16, 2023

skirodev commented Jul 17, 2023 • edited Loading

LLukas22 commented Jul 18, 2023

skirodev commented Jul 26, 2023

LLukas22 commented Jul 26, 2023

philpax left a comment

Choose a reason for hiding this comment

LLukas22 commented Jul 27, 2023

ghost commented Aug 2, 2023

LLukas22 commented Aug 2, 2023

skirodev commented Jul 15, 2023 •

edited

Loading

skirodev commented Jul 17, 2023 •

edited

Loading