-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix the default value for temperature in ChatCompletionRequest #11219
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
This modifies the default behavior of the vLLM OpenAI Compatible Server. Considering that temperature is very likely to be customized by users, I believe the impact is minimal. |
Some tests failed, and it doesn’t seem to be caused by my PR. @simon-mo |
Out of curiosity, why was this change made? 0.7 has been the default temperature for a long time, and changing from 0.7 to 1 is not a small difference in terms of behavior, and in particular it may impact the quality of tool calls using auto tool choice without a temperature manually specified |
The default temperature for offline inference (LLM class) is 1.0, same as OpenAI’s official implementation. It’s also the default in most frameworks. I didn’t check past PRs to see why the OpenAI Compatible Server uses 0.7 as the default. It’s a bit odd. Also, like I mentioned before, in temperature-sensitive cases, it’s common for users to set their own temperature. This shouldn’t affect too many use cases. Maybe we should note this change in the OpenAI Compatible Server docs? @simon-mo |
I actually don't recall why is 0.7 given both huggingface and OpenAI have it at 1.0. Are we aware of any breakage? |
This value has been 0.7 since it was introduced, see #116. What’s strange is that ChatCompletionRequest uses 0.7, while CompletionRequest uses 1.0. I searched Google for the source of 0.7, and it seems to be because OpenAI chatbots like ChatGPT use it (though no solid proof). The default for OpenAI’s Chat Completion API is 1.0. I think aligning with that makes sense. |
Aligning with default behavior 100% makes sense, but it's also to avoid risking breaking workflows that depend on existing defaults. It seems like this should be at least a minor version bump in semver, if not a major due to potentially breaking behavior; although it's definitely fuzzy. it's syntactically backwards-compatible, although arguably not semantically |
FIX #10930
Set the default value for
temperature
inChatCompletionRequest
to 1.0.References:
vllm/vllm/sampling_params.py
Line 176 in 571da8f
https://platform.openai.com/docs/api-reference/chat