Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Support Falcon #293

Open
zcourts opened this issue Jun 2, 2023 · 10 comments · Fixed by #313
Open

Support Falcon #293

zcourts opened this issue Jun 2, 2023 · 10 comments · Fixed by #313
Assignees
Labels
topic:model-support Support for new models

Comments

@zcourts
Copy link

zcourts commented Jun 2, 2023

Similar to MPT, Falcon is Apache licensed, weights and all!

  1. https://huggingface.co/tiiuae/falcon-40b
  2. https://huggingface.co/tiiuae/falcon-40b-instruct

And according to the HuggingFace leaderboard it outperforms all current open source models including MPT.

It seems having a GGML conversion done of the model is a necessary precursor to having it included.

I don't think I have the expertise to do this but we may be able to help (e.g. can give access to a V100S or V100S to do the conversion)

@LLukas22 LLukas22 added the topic:model-support Support for new models label Jun 2, 2023
@LLukas22
Copy link
Contributor

LLukas22 commented Jun 2, 2023

Already on it, got it converted and quantized but it produced gibberish. Im waiting on ggerganov/llama.cpp#1602 to see how they will handle the Q, K, V weights. I dont want to create two seperate falcon-ggml ecosystems, so im waiting for the upstream ggml implementation.

@zcourts
Copy link
Author

zcourts commented Jun 2, 2023

Ongoing discussion worth tracking here to get GG conversion ggerganov/llama.cpp#1602

Found after posting this here. An attempt to convert has been made ggerganov/llama.cpp#1602 (comment)

@zcourts
Copy link
Author

zcourts commented Jun 2, 2023

Looks like our posts overlapped! Great to hear, I've offered to provide GPU access to further the work being done in ggerganov/llama.cpp#1602 - will follow up as that progresses

@KerfuffleV2
Copy link
Contributor

There is now a working GGML example for 40B: ggerganov/ggml#231

@LLukas22
Copy link
Contributor

That's great! Maybe i will create a draft, but i would like to wait until it get's merged into ggml.

@iHaagcom
Copy link

Working one here https://github.com/jploski/ggml/tree/falcon40b

@LLukas22
Copy link
Contributor

Yeah, I noticed that. It would be great if someone could try porting it to Rust. I'm currently quite busy implementing GPU acceleration for all architectures.😬

@philpax
Copy link
Collaborator

philpax commented Jun 28, 2023

Damn, was hoping editing the description would cancel out the issue-closing.

Anyhow - I've merged in the Falcon 7B implementation, but it doesn't handle 40B, and it requires 32-bit memory tensors as the repeat operation it uses doesn't work with 16-bit tensors. Because of these caveats - and the continuing work on (one of) the original implementations in https://github.com/cmp-nct/ggllm.cpp - I've decided to merge it in, but disable it by default.

I'll keep this issue open until Falcon is truly ready to fly.

@philpax
Copy link
Collaborator

philpax commented Jul 27, 2023

@LLukas22 should we close this or wait until the model format has stabilised?

@LLukas22
Copy link
Contributor

We should wait until GGUF is implemented and we have all the necessary fields in the model file.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
topic:model-support Support for new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants