Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-compose-wsl2-and-linux-nvidia-gpu-ai.yml #7

Closed

Conversation

coresolutiondoteu
Copy link

@coresolutiondoteu coresolutiondoteu commented Sep 23, 2024

Description

GPU Enabled AI Assistant Container for Linux and Windows OS with Docker under WSL2 (for Windows).

Explanation:

For this to work, you will need Linux and/or Windows with WSL2 (updated),
both OS must be equiped with the dedicated GPU which is usable for LLM
(with CUDA cores).

If this is the case, then ollama service will start with the GPU and AI
assistant will be super fast in responses compared to the CPU only.

Notebook users, be careful about the nVIDIA Optimus, that can break things.
If you have Optimus enabled notebook, just switch to dedicated GPU (sometimes
it may be required to do that in the BIOS directly).

Updated drivers must be installed.

Functional check can be done using: https://docs.docker.com/desktop/gpu/

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

More details (Linux): https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image

Type of change

  • New feature (non-breaking change which adds functionality)
    GPU Enabled AI Assistant Container for Linux and Windows with Docker under WSL2.

How Has This Been Tested?

Several times tested with many different scenarios. Optimus enabled notebook issue described in explanation.

@coresolutiondoteu
Copy link
Author

coresolutiondoteu commented Oct 6, 2024

One thing that is still unclear for me is that after some time the GPU free up the resources, which means that AI becomes for a while unresponsive, until the GPU memory is again loaded with the data. That is pretty intense process for the main CPU... I will try to look for some docker.d conf/WSL conf that would solve this problem. And test again also Optimus as it could be wrong assumption from my side, as this has happened with Optimus as well. More testing will be done and then I will update the PR. Linux can't be tested due to time constraints.

@HappyStoic
Copy link
Collaborator

Hello @coresolutiondoteu, thank you for the PR, we appreciate it a lot! Sorry it took me so long to reply.

Currently one of our goals/constraints is to have the lab working out of the box on most machines:

  • Windows/macOS/Linux
  • with or without GPU
  • any proc architecture

Even though we would like to make the AI assistant faster, this change could break the lab on several machines, right?

Do you think you could instead make a section in the repository's discussions page explaining for others how to modify the docker-compose file to enable GPU acceleration? We could then incorporate this text to our wiki page. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants