Any wish to implement llama.cpp (llama with CPU only) ? #315

BadisG · 2023-03-14T10:17:28Z

BadisG
Mar 14, 2023

Hello,

I wanted to know if you would be willing to integrate llama.cpp into your webui. With this implementation, we would be able to run the 4-bit version of the llama 30B with just 20 GB of RAM (no gpu required), and only 4 GB of RAM would be needed for the 7B (4-bit) model. Combining your repository with ggerganov's would provide us with the best of both worlds.

If anyone is wondering what's the speed we can get for using an only CPU interface, I got those averages for my Intel Core i7 10700K :

160 ms/token -> 7B Model

350 ms/token -> 13B Model

760 ms/token -> 30B Model

madmads11 · 2023-03-16T19:40:58Z

madmads11
Mar 16, 2023

Thankfully, it seems that there is some work being done to make llama.cpp work with other programs. Crossing my fingers we can use llama.cpp on text-generation-webui in the near future.

Running on an i7-12700KF I get:

500 ms/token -> 30B Model

And since I am limited to 8GB VRAM, it is the only way for me and probably the vast majority of people to run a model larger than 7b. Implementing support for llama.cpp could be the gateway to much higher adoption of text-generation-webui since the default user experience of llama.cpp is lacking. The text-generation-webui could allow much better and more advanced use of the model which would push the boundaries of what's conceivably possible on consumer hardware.

4 replies

thomasantony Mar 18, 2023

I am the author of that PR. I am working on some python bindings for lambda.cpp as well (https://pypi.org/project/llamacpp/). The version on PyPI is a bit outdated. Things don't quite work well in Python with the latest refactored version yet. But I will take a stab at it this weekend and maybe also look at an integration with this webui. Cheers!

madmads11 Mar 19, 2023

I am the author of that PR. I am working on some python bindings for lambda.cpp as well (https://pypi.org/project/llamacpp/). The version on PyPI is a bit outdated. Things don't quite work well in Python with the latest refactored version yet. But I will take a stab at it this weekend and maybe also look at an integration with this webui. Cheers!

Thank you so much! Your work is immensely appreciated and I hope you know that!

thomasantony Mar 20, 2023

Thank you for your kind words. I am working on a proof-of-concept here: #447

This is the first time I have worked with gradio (though I have used AUTOMATIC1111's UI before). Any help would be welcome! :)

madmads11 Mar 22, 2023

Thank you for your kind words. I am working on a proof-of-concept here: #447

This is the first time I have worked with gradio (though I have used AUTOMATIC1111's UI before). Any help would be welcome! :)

That's fantastic! I can't wait to see what comes of it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any wish to implement llama.cpp (llama with CPU only) ? #315

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Any wish to implement llama.cpp (llama with CPU only) ? #315

BadisG Mar 14, 2023

Replies: 1 comment · 4 replies

madmads11 Mar 16, 2023

thomasantony Mar 18, 2023

madmads11 Mar 19, 2023

thomasantony Mar 20, 2023

madmads11 Mar 22, 2023

BadisG
Mar 14, 2023

Replies: 1 comment 4 replies

madmads11
Mar 16, 2023