-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: "Expected all tensors to be on the same device, but found at least two devices" #67
Comments
Potential method:
Key Improvements:
|
hello, no offence but this looks like a llm's reply. Onto the topic, we don't support multi-gpu setups with the default configuration. However, you can use these params in the config to do that: https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L73-L75 For your system, it could be
I can review a PR if you wish to write such a function. It is very much possible but it might not take the processing speed of those devices into account. Anyhow, it is welcomed. |
Thanks @kyteinsky I will try making the change to handle a dynamic number of GPUs. I am interested in doing this not just for the LLM but particularly the embedding due to the server load related to this. I believe this to be important in improving the scalability of context chat and I believe that scalability is important to anybody using Nextcloud as a company document repository, for knowledge management and the core of their RAG implementation. Clearly the reduction of duplication is priority number one in this regard and then processing efficiency follows. |
I think you can just get by using the options in the instructor's config. Inside We still can benefit from the enhancement to spawn multiple instances of the embedder if the gpu memory allows this. That would be a nice change. The LLM scalability is being worked on. We switched to using Nextcloud's Task Processing API to generate the response so it can use whatever is configured to generate text in Nextcloud like llm2 or integration_openai. Example is in the default config.
That is our no. 1 priority right now, yes. We're working on it. |
Describe the bug
Context_chat_backend starts giving 500 internal server errors under load from multiple jobs.
Running on a server with two P40 GPUs (24gb each)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
This appear to be a fairly common error.
It could be because a device map needs to be defined appropriately. See here and here.
Server logs (if applicable)
Context Chat Backend logs (if applicable, from the docker container)
The text was updated successfully, but these errors were encountered: