Replies: 1 comment 6 replies
-
Works fine on my end: $ ▶ curl -s --request POST --url http://127.0.0.1:7020/v1/chat/completions --header "Content-Type: application/json" --data '{"messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello, how are you today?" } ], "n_predict": 64}' | jq
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I'm doing well, thank you for asking! I'm a large language model, so I don't have emotions like humans do, but I'm always ready to help and assist with any questions or tasks you may have. How about you? How's your day going so far?",
"role": "assistant"
}
}
],
"created": 1727176285,
"model": "gpt-3.5-turbo-0613",
"object": "chat.completion",
"usage": {
"completion_tokens": 58,
"prompt_tokens": 28,
"total_tokens": 86
},
"id": "chatcmpl-vB0PElZDtrMn2PXwBDhF7tkNl54IFFQl"
} Can you show the output of the command above? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm using the following gguf model: https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf
I'm starting the llama cpp server by running the following command:
When I now try to do a completion with the openai client it times out:
And here's the log from the llama-cpp server corresponding to the completion request which just times out:
It doesn't return anything. I've tried setting
--n-predict -2
according to the following [issue](# #3969 (comment)) but that makes the model only produce a single token:I have the same issue with Llama 3 the OpenAI API server just does not work...
Beta Was this translation helpful? Give feedback.
All reactions