Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to control the python workers on Stylegan2 #308

Open
lawleenaja opened this issue Aug 13, 2024 · 0 comments
Open

Is there any way to control the python workers on Stylegan2 #308

lawleenaja opened this issue Aug 13, 2024 · 0 comments

Comments

@lawleenaja
Copy link

lawleenaja commented Aug 13, 2024

I'm working with an 8 GPU system with P100 16GBs on a Ubuntu 18 LTS OS.

I have P100 16GB GPUs but I can only run Stylegan2 with BATCH=16. This is because one of the 8 GPUs consumes too much memory (the GPU running the 8 python workers) with BATCH=24 or default.

I have swapped in some P40s with 24GB and everything runs fine using the mixed GPUs (I assume because they are all Pascal), but Stylegan always selects one of the P100 16GB systems for running the 8 python workers. I have tried moving the 24GB P40s around... position 1, position 2, position 1&2. Stlyegan always finds the first P100 and runs the workers there... the training quickly fails with a "not enough GPU memory" problem because the P100 was selected.

at Batch=16 I have my python worker GPU running around 14.5 GB and the other GPUs around 10.5 GB. Obviously that python worker GPU is temporarily is spiking above 16GB causing the memory failure, but if I could control the python worker GPU selection, then perhaps I could train at Batch=32 (by getting Stylegan to target my python workers on my P40 24GB).

I have also purchased another 8 GPU system, but I'm thinking that if I can't use any more that 10.5 GB on the other 7GPUs, I might as well just buy 7 12GB P100s and try swapping in a single 16GB P100 and see if that configuration will work. I know that theory didnt work with the P40 swap but I'm hoping that introducing the exact same GPU model might have different results.

Does any one know if I can "control the python worker GPU selection" within Stylegan2? This might solve my problem on both systems and save some other people some considerable GPU investment by running stylegan with one "Master" GPU with more memory and some other "Worker GPU's" with 4GB less memory.

Then there is the longer term question... why stylegan does not distribute the python worker processes across all GPUs... not everyone has an 8GPU system with A100s or V100s and 24-32+GB memory. (Perhaps one dedicated GPU needs to consolidate the results which were run across the other GPUs... which is understandable?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant