Autoscaling support in Ray-llm #133

Jeffwan · 2024-02-21T00:08:47Z

Just curious does ray-llm fully leverage ray serve autoscaling (https://docs.ray.io/en/latest/serve/autoscaling-guide.html)?
Seems ray serve only support target_num_ongoing_requests_per_replica and max_concurrent_queries , As we know, LLM output varies and these are not good for LLM scenarios. how do you achieve better autoscaling support for LLM?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling support in Ray-llm #133

Autoscaling support in Ray-llm #133

Jeffwan commented Feb 21, 2024

Autoscaling support in Ray-llm #133

Autoscaling support in Ray-llm #133

Comments

Jeffwan commented Feb 21, 2024