Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falcon support? #24

Closed
matthoffner opened this issue Jun 16, 2023 · 13 comments
Closed

Falcon support? #24

matthoffner opened this issue Jun 16, 2023 · 13 comments

Comments

@matthoffner
Copy link

I've been tracking the Falcon ggerganov/ggml#231 PR, and as I understand currently it won't work on a released version of ggml.

Any suggestions on how to test it config wise are appreciated, I'm assuming llama might not work based on other PRs.

@marella
Copy link
Owner

marella commented Jun 16, 2023

I'm also waiting for that PR to be merged. Hopefully it will be merged this weekend ggerganov/ggml#231 (comment)

@matthoffner
Copy link
Author

Thanks, do you know if it is possible to point ctransformers to a branch of ggml for testing?

@TheBloke
Copy link

+1 to this

However I don't think the ggml PR is the one to implement. Instead I would use the new implementation in ggllm.cpp: https://github.com/cmp-nct/ggllm.cpp

This is now the best Falcon GGML implementation, including CUDA GPU acceleration with support for both 7B and 40B models.

I don't know if this will end up also being in the GGML repo, or maybe even eventually the llama.cpp repo (as ggllm.cpp is a fork of llama.cpp).

But either way, this is the Falcon implementation of interest right now.

And I wonder whether there's even a need to wait for it to be fully stable? It's already useful and being used by people. I have four Falcon GGML repos now:

If ctransformers supported this I think it would help accelerate the use of Falcon GGML.

@marella
Copy link
Owner

marella commented Jun 22, 2023

@matthoffner It is not possible to point ctransformers to a branch of ggml as the model code has to be modified to integrate into the common interface I provide for all models.


Thanks @TheBloke I was waiting for the PR to be merged but since you are already providing the files, I added experimental support for Falcon models using the ggllm fork in the latest version 0.2.10

It has CUDA support similar to the LLaMA models. I tested with 7B model but my machine doesn't have enough memory for the 40B model.

@TheBloke
Copy link

Fantastic! That's great news, thank you marella. That was super quick.

I will update my READMEs to mention this.

@ParisNeo could you check if this works automatically in LoLLMS, and if so maybe add some Falcon GGML entries? Then I will mention also in the README, and you will be the first UI to support Falcon GGML! :)

@ParisNeo
Copy link

image
https://discord.com/channels/@me/1097295255801442306/1121584559138545704

I am using 0.2.10.

Am I missing something?

@marella
Copy link
Owner

marella commented Jun 22, 2023

You should use model_type="falcon":

llm = AutoModelForCausalLM.from_pretrained("TheBloke/falcon-7b-instruct-GGML", model_type="falcon")

@matthoffner
Copy link
Author

@marella @TheBloke Thank you!! I think I've got the 40B with GPU on a HF space:

https://huggingface.co/spaces/matthoffner/falcon-fastapi

@TheBloke
Copy link

I've added config.json to my four repos so manual model_type selection by the client shouldn't be needed from now on

image

@ParisNeo
Copy link

Thank you very much for this nice work. Tested the 7B model on my pc and it is really solid even compared to 13B from other models.
@marella do you have a twitter account so i can follow you and put a link when i credit your work ?

@TheBloke
Copy link

TheBloke commented Jun 24, 2023 via email

@marella
Copy link
Owner

marella commented Jun 24, 2023

Hey, I don't have a twitter account. I'm on LinkedIn https://www.linkedin.com/in/ravindramarella/ but I don't post anything there.
If you want to link, you can just link to this repo.

@ParisNeo
Copy link

Ok. Very nice profile by the way. Nice to meet you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants