Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Multiple Simultaneous LLM AI API Endpoints for Self-Hosting and Model Selection #34

Closed
2good4hisowngood opened this issue Sep 28, 2023 · 10 comments

Comments

@2good4hisowngood
Copy link
Collaborator

2good4hisowngood commented Sep 28, 2023

Description:
We would like to propose the addition of a new feature to AutoGen that enables users to configure and utilize multiple Language Model (LLM) AI API endpoints for self-hosting and experimentation with different models. This feature would enhance the flexibility and versatility of AutoGen for developers and researchers working with LLMs.

Feature Details:

  1. Endpoint Configuration:

    • Allow users to store API keys and configure multiple endpoints.
    • Implement support for an environment (env) file to securely store sensitive information locally, facilitating scripted pass-through of values. Add to .gitignore
  2. Custom Endpoint Names:

    • Provide the ability to assign user-friendly names to each configured endpoint. This helps users easily identify and differentiate between endpoints. It would also allow multiple models to be leveraged on the same endpoint by having different configurations for each. A check could occur to validate that the endpoint has the model expected, and if not, do a quick unload/load of the desired model.
  3. Chat Parameters:

    • Integrate settings for chat parameters, such as temperature and other relevant options, that can be adjusted per endpoint. This allows users to fine-tune model behavior.
  4. Model Selection (if applicable):

    • If applicable to the specific LLM, enable users to preset a model for each endpoint. This feature can be especially useful when working with multiple LLMs simultaneously.
  5. API Key Management (if applicable):

    • For LLM services like OpenAI that require API keys, provide a dedicated parameter in each endpoint for users to input and manage their API keys for each endpoint.
  6. Endpoint Address:

    • Allow users to specify the endpoint address (URL) to which API requests should be sent. This flexibility is crucial for self-hosted instances or when working with different LLM providers.
  7. Optional - Endpoint Tagging:

  • Allowing us to add tags like #code #logic, or #budget could let us give key indicators of where a model's strengths are in, and select from a pool of models with a particular benefit, allowing more diverse outcomes. It could also allow for side-by-side comparisons. This could allow future result tracking/scoring to better identify which models are best at particular features, by having multiple #code models, and testing each's results you can identify and retrain or replace under-performing models to build an optimum workflow.

Expected Benefits:
This feature will benefit developers, researchers, and users who work with LLMs by offering a centralized and user-friendly interface for managing multiple AI API endpoints. It enhances the ability to experiment with various models, configurations, and providers while maintaining security and simplicity. This could allow different characters to leverage specific fine-tuned models rather than the same model for each. This could also allow self-hosted users to experiment with expand the number of repeated looped calls without drastically increasing the bill.

Additional Notes:
Consider implementing an intuitive user interface for configuring and managing these endpoints within the GitHub platform, making it accessible to both novice and experienced users.

References:
Include any relevant resources or references that support the need for this feature, such as the growing popularity of LLMs in various fields and the demand for flexible API management solutions.

Related Issues/Pull Requests:

Assignees:
If you permit this ticket to remain open, I will assemble some links and resources, as well as opening another ticket to handle TextGenWebUI with relevant links there to implementing it. I can try implementing and doing a PR if someone else doesn't get to it first.

Thank you for considering this feature request. I believe that this enhancement will greatly benefit the AutoGen community and its users working with Language Model AI API endpoints.

edit: 9.28
Looking through the repo, it looks like there's a standardized json config, going to look into this next as a method for expanding and holding the features listed above. page found while reading documentation, note near top how it loads the json, then references it further down as : https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat_research.ipynb

Found it https://github.com/microsoft/autogen/blob/main/OAI_CONFIG_LIST_sample

Going to look into how it gets loaded.

@2good4hisowngood 2good4hisowngood changed the title Feature Request: Support for Multiple LLM AI API Endpoints for Self-Hosting and Model Selection Feature Request: Support for Multiple Simultaneous LLM AI API Endpoints for Self-Hosting and Model Selection Oct 6, 2023
@2good4hisowngood
Copy link
Collaborator Author

2good4hisowngood commented Oct 6, 2023

If using textgenwebui locally, it'd be great to be able to switch between models without having to host multiple models simultaneously.

Get models

import requests

HOST = '0.0.0.0:5000'

def model_api(request):
    response = requests.post(f'http://{HOST}/api/v1/model', json=request)
    return response.json()

model_api({'action': 'list'})['result']

Load models

def model_load(model_name):
    return model_api({'action': 'load', 'model_name': model_name})

So something like get, if loaded model = listed model, continue, else load the desired model for the agent.

@sonichi sonichi added the llm label Oct 22, 2023
@taoyiran
Copy link

taoyiran commented Nov 2, 2023

Wow, I am deeply appreciative of your work. And I am looking for a way to implement some similar target, such as to connect AutoGen to some LLM model's endpoint like 'QWen-turbo''s online service. May I join your team or at least do something for you guys?

@Pakmandesign
Copy link

Same! Happy to help.

@ImagineL
Copy link

ImagineL commented Nov 6, 2023

@taoyiran tao So am I! It's important for me to support the qwen online server. Feel free to contact me anytime if you need assistance

@taoyiran
Copy link

taoyiran commented Nov 6, 2023

@ImagineL Glad to see your message! Now I just study on this project and try to connect AutoGen to Qwen's online service. I will update my status and, if possible, my code here. Thx everyone!

@ImagineL
Copy link

ImagineL commented Nov 6, 2023

@taoyiran I'm looking forward to your code sharing! I had to analyze the source code, and it seems hard to resolve if we don't modify the source code. good luck !

@weldonla
Copy link

weldonla commented Nov 9, 2023

In looking for how to do this, I found this thread. I also found the answer. You can just set multiple configurations. It seems as though this feature is already implemented if I understand what the feature request is.

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager

config_list = [
    {
        "api_type": "open_ai",
        "api_base": "http://localhost:1234/v1",
        "api_key": "NULL"
    }
]
config_list2 = [
    {
        "api_type": "open_ai",
        "api_base": "http://localhost:1235/v1",
        "api_key": "NULL"
    }
]
config_list3 = [
    {
        "api_type": "open_ai",
        "api_base": "http://localhost:1236/v1",
        "api_key": "NULL"
    }
]

llm_config = {
    "config_list": config_list,
    "seed": 47,
    "temperature": 0.5,
    "max_tokens": -1,
    "request_timeout": 6000
}
llm_config2 = {
    "config_list": config_list2,
    "seed": 47,
    "temperature": 0.5,
    "max_tokens": -1,
    "request_timeout": 6000
}
llm_config3 = {
    "config_list": config_list3,
    "seed": 47,
    "temperature": 0.5,
    "max_tokens": -1,
    "request_timeout": 6000
}

user_proxy = UserProxyAgent(
    name="user_proxy",
    system_message="A human admin.",
    max_consecutive_auto_reply=10,
    llm_config=llm_config,
    human_input_mode="ALWAYS"
)

person_1 = AssistantAgent(
    name="person_1",
    system_message="sys_message",
    llm_config=llm_config2,
)

person_2 = AssistantAgent(
    name="person_2",
    system_message="sys_message",
    llm_config=llm_config3,
)

person_3 = AssistantAgent(
    name="person_3",
    system_message="sys_message",
    llm_config=llm_config,
)

groupchat = GroupChat(
    agents=[user_proxy, person_1, person_2, person_3], messages=[]
)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(manager, message="""message""")

@ishaan-jaff
Copy link

@2good4hisowngood @taoyiran @Pakmandesign @ImagineL @weldonla you can do this using LiteLLM Proxy Server
It can process (500+ requests/second)

Here's the quick start:

Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model

Step 1 Create a Config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-4",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

@ImagineL
Copy link

This looks very reliable, thank you! I'm going to try it!

@thinkall
Copy link
Collaborator

We are closing this issue due to inactivity; please reopen if the problem persists.

randombet pushed a commit to randombet/autogen that referenced this issue Sep 9, 2024
moved the package in notebooks from pyautogen to autogen
jackgerrits added a commit that referenced this issue Oct 2, 2024
* WIP code execution

* add tests, reorganize

* fix polars test

* credit statements

* attributions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants