What will happen to the API when load balancing is enabled and rate limiting is enabled? #8314
Unanswered
yuyongkratos
asked this question in
Help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
I want to enable the load balancing function of the language model in Dify, but I don't know the specific working mode.
My current situation is that there are many different businesses that need to call the large language model, such as: Q&A, article translation, information extraction, etc.; due to frequent business calls, the large model service will often be suspended.
So, I plan to enable the Model Load Balancing function, but I don't know the detailed working mode.
But I deployed two local models Model_1 and Model_2, they belong to qwen2-72b-chat, and many APIs provided by Dify use qwen2-72b-chat;
I have the following questions:
2. Additional context or comments
I found a link about load balancing, but there is no more detailed introduction about the effect of load balancing.
The link is as follows:
load-balancing[https://docs.dify.ai/v/zh-hans/guides/model-configuration/load-balancing]
Beta Was this translation helpful? Give feedback.
All reactions