Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add support to connect falcon and llama models with this . #33

Closed
hemangjoshi37a opened this issue Sep 9, 2023 · 6 comments
Closed
Assignees

Comments

@hemangjoshi37a
Copy link

Please add support to connect falcon and llama models with this .

@hemangjoshi37a
Copy link
Author

This has been referenced in #27

@j-loquat
Copy link

One idea for this is that you could allow up to two local models to be loaded and assigned to one or more agents. We could load one model into the GPU and the other into the CPU with some RAM allocated to it. So say llama2 into the gpu and use it for most of the agents, and then a Python optimized smaller model into cpu for the engineer agent.

@andraz
Copy link

andraz commented Sep 10, 2023

In theory a long running process could:

  1. accumulate a queue of prompts/tasks for a specific agent
  2. swap the engine to that agent and load it in 10-30s
  3. perform actions and save the queue results of the agent
  4. then prepare queues for other agents from results
  5. repeat at 1.

This would allow us to use big models more efficiently, not accumulating a lot of time penalty for VRAM loading times on swaps.

@hemangjoshi37a
Copy link
Author

@andraz @j-loquat your solution and suggestions are looking good to implement.

@j-loquat
Copy link

One thing to consider with local LLM agents is that we should keep the prompts shorter than for OpenAI and reduce the temperature to perhaps lower than 0.5. Lower temp and shorter prompts makes a huge difference in local response times as per GPT4All project.

@Alphamasterliu
Copy link
Contributor

Hello, regarding the use of other GPT models or local models, you can refer to the discussion on our GitHub page: #27. Some of these models have corresponding configurations in this Pull Request: #53. You may consider forking the project and giving them a try. While our team currently lacks the time to test every model, it's worth noting that they have received positive feedback and reviews. If you have any other questions, please don't hesitate to ask. We truly appreciate your support and suggestions. We are continuously working to improve more significant features, so please stay tuned.😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants