Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ggllm.cpp #3357

Closed
djmaze opened this issue Jul 28, 2023 · 14 comments
Closed

Add support for ggllm.cpp #3357

djmaze opened this issue Jul 28, 2023 · 14 comments
Labels
enhancement New feature or request stale

Comments

@djmaze
Copy link

djmaze commented Jul 28, 2023

Falcon is one of the very few good multilingual models. Support for the Falcon family models (7b / 40b) in text-generation-webui is currently very limited (4 bit only, bad performance) through AutoGPTQ. Also it needs at least 35 GB of VRAM.

ggllm.cpp is optimized for running quantized versions of those models and runs much faster. Also, it supports down to 2 bit quantized versions, which allows using the 40b model on a single 24 GB GPU.

@djmaze djmaze added the enhancement New feature or request label Jul 28, 2023
@Flanua
Copy link

Flanua commented Jul 29, 2023

Just my two cents: I personally don't recommend to down anything to 2 bits lower than 8 bits. The AI model degrades significantly.

@jllllll
Copy link
Contributor

jllllll commented Jul 30, 2023

Not technically a duplicate, but close enough to: #3351

CTransformers is what would be used if ggllm.cpp were to be integrated.

@djmaze
Copy link
Author

djmaze commented Jul 30, 2023

Just my two cents: I personally don't recommend to down anything to 2 bits lower than 8 bits. The AI model degrades significantly.

Not sure I understood you correctly. I am using the 3-bit quantized version (GGML_TYPE_Q3_K) of the linked model with ggllm.cpp and AFAICS it works really well. Especially when using German, I did not find a better a better model yet.

@jllllll
Copy link
Contributor

jllllll commented Jul 30, 2023

3bit is worth using only on the larger model sizes where it makes less of a difference. 30B+. It does lower output quality significantly, but is still worth using if it allows you to use a larger model. 2bit is pretty much useless though.

@goodglitch
Copy link

Just my two cents: I personally don't recommend to down anything to 2 bits lower than 8 bits. The AI model degrades significantly.

Not sure I understood you correctly. I am using the 3-bit quantized version (GGML_TYPE_Q3_K) of the linked model with ggllm.cpp and AFAICS it works really well. Especially when using German, I did not find a better a better model yet.

Try 2bit model with moderate difficulty coding task and you will see what he is talking about. Anything that requires precision is out of question. It is good that for your use case a lower quantization still produces output that makes sense.

@oobabooga
Copy link
Owner

ctransformers is on my radar, I'll merge one of the open PRs adding support soon. It's always a challenge to add new backends because they usually don't come with precompiled wheels.

@jllllll
Copy link
Contributor

jllllll commented Jul 30, 2023

ctransformers is on my radar, I'll merge one of the open PRs adding support soon. It's always a challenge to add new backends because they usually don't come with precompiled wheels.

I'm currently in the process of building pre-compiled wheels for CUDA 11.7.
ctransformers has a wheel for CUDA 12.1, but that isn't particularly useful for us as we use CUDA 11.7.

Fortunately, ctransformers already handles CUDA and non-CUDA builds internally, so a separate package won't be needed like with llama-cpp-python.

@oobabooga
Copy link
Owner

That's very nice to hear @jllllll.

@jllllll
Copy link
Contributor

jllllll commented Jul 30, 2023

That's very nice to hear @jllllll.

https://github.com/jllllll/ctransformers-cuBLAS-wheels/releases/download/AVX2/ctransformers-0.2.16+cu117-py3-none-any.whl

This wheel includes CUDA binaries for both Windows and Linux. MacOS is also supported through non-CUDA binaries.

tallesairan added a commit to tallesairan/text-generation-webui that referenced this issue Aug 1, 2023
@RDearnaley
Copy link

RDearnaley commented Aug 24, 2023

+1 on this request, mostly for running Falcon 40B quantized in GGML (in my case, on Apple Silicon).

Not technically a duplicate, but close enough to: #3351

CTransformers is what would be used if ggllm.cpp were to be integrated.

I note #3351 and #3313 are now done, so does that mean that this is now working, or just that it's unblocked?

@jllllll
Copy link
Contributor

jllllll commented Aug 25, 2023

ctransformers has been implemented as a loader, which includes ggllm.cpp.

It should work.

@github-actions github-actions bot added the stale label Oct 6, 2023
@github-actions
Copy link

github-actions bot commented Oct 6, 2023

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@github-actions github-actions bot closed this as completed Oct 6, 2023
@augchan42
Copy link

Still doesn't work for me, I'm on Macbook Pro M2 Max (Apple Silicon):
pip3 install -r requirements.txt output:

Installing collected packages: ctransformers
Attempting uninstall: ctransformers
Found existing installation: ctransformers 0.2.27
Uninstalling ctransformers-0.2.27:
Successfully uninstalled ctransformers-0.2.27
Successfully installed ctransformers-0.2.27+cu121
(venv) auchan@Augustins-MBP text-generation-webui %

Then when trying to load the model after restarting the server:
2023-12-05 09:43:16 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/Users/auchan/projects/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/auchan/projects/text-generation-webui/modules/models.py", line 85, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/auchan/projects/text-generation-webui/modules/models.py", line 280, in ctransformers_loader
from modules.ctransformers_model import CtransformersModel
File "/Users/auchan/projects/text-generation-webui/modules/ctransformers_model.py", line 1, in
from ctransformers import AutoConfig, AutoModelForCausalLM
ModuleNotFoundError: No module named 'ctransformers'

@augchan42
Copy link

augchan42 commented Dec 5, 2023

Might be because it's tied to AVX2:
Collecting ctransformers==0.2.27+cu121 (from -r requirements.txt (line 88))
Downloading https://github.com/jllllll/ctransformers-cuBLAS-wheels/releases/download/AVX2/ctransformers-0.2.27+cu121-py3-none-any.whl (15.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.5/15.5 MB 12.2 MB/s eta 0:00:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

7 participants