Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/Feature Request: GPU Memory Management for Dual-GPU Systems #119

Open
SeriousPaul1270 opened this issue Sep 11, 2024 · 0 comments
Open

Comments

@SeriousPaul1270
Copy link

I've got a bit of an issue here. I'm running two GPUs - one Nvidia with 8GB of VRAM and an AMD card with 4GB of VRAM. When loading up a model, I'm limited to the total of 8GB of VRAM, which isn't too bad. However, if there's any load on the AMD card, the model just won't load.

I've also noticed that when the model is loaded onto the Nvidia card (which has more than enough memory to handle it), about 3.5GB ends up on the AMD card and the rest stays on the Nvidia card. This seems like a waste of resources to me - why not just let the model use all the VRAM available on the Nvidia card?

I'm wondering if this is a bug or if there's some specific reason for how OpenWebUI handles GPU memory. If it's intentional, maybe we could revisit the memory allocation strategy? I'd love to get your thoughts on this.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant