Replies: 2 comments
-
I'm seeing the same thing with 4bit models. Works fine with non-quantized models though, so looks like something broke recently. |
Beta Was this translation helpful? Give feedback.
0 replies
-
It may not relate to your issue but that happens to me when I forget to set at least some 'pre_layer' in the Model tab. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
=========================
Windows Command:
python.exe server.py --model vicuna-13b-GPTQ-4bit-128g --auto-devices --wbits 4 --groupsize 128 --chat
Output:
INFO:Gradio HTTP request redirected to localhost :)
INFO:Loading vicuna-13b-GPTQ-4bit-128g...
INFO:Found the following quantized model: models\vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors
INFO:Loaded the model in 3.46 seconds.
INFO:Loading the extension "gallery"...
Running on local URL: http://127.0.0.1:7860/
Could not create share link. Please check your internet connection or our status page: https://status.gradio.app/
input prompt but no response in WebUI, please help me check this out, THX!
Beta Was this translation helpful? Give feedback.
All reactions