You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the model directly (no adapters) works fine. However, when I use any adapter (I have my own + tried several public adapters found on HuggingFace, listed below), I get the following error:
Request failed during generation: Server error: No suitable kernel. h_in=896 h_out=64 dtype=BFloat16
When I use streaming, I can actually see that the first token gets generated, and then it fails:
data: {"id":"null","object":"chat.completion.chunk","created":0,"model":"null","choices":[{"index":0,"delta":{"role":"assistant","content":"An"},"finish_reason":null}]}
data: {"error":"Request failed during generation: Server error: No suitable kernel. h_in=896 h_out=64 dtype=BFloat16","error_type":"generation"}
System Info
Docker Command:
Hardware:
AWS g6.xlarge
OS:
Amazon Linux:
Model Used:
Information
Tasks
Reproduction
Running the model directly (no adapters) works fine. However, when I use any adapter (I have my own + tried several public adapters found on HuggingFace, listed below), I get the following error:
When I use streaming, I can actually see that the first token gets generated, and then it fails:
Some adapters I tried:
Expected behavior
Expecting model to properly generate with LoRA adapter...
The text was updated successfully, but these errors were encountered: