You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working with an 8 GPU system with P100 16GBs on a Ubuntu 18 LTS OS.
I have P100 16GB GPUs but I can only run Stylegan2 with BATCH=16. This is because one of the 8 GPUs consumes too much memory (the GPU running the 8 python workers) with BATCH=24 or default.
I have swapped in some P40s with 24GB and everything runs fine using the mixed GPUs (I assume because they are all Pascal), but Stylegan always selects one of the P100 16GB systems for running the 8 python workers. I have tried moving the 24GB P40s around... position 1, position 2, position 1&2. Stlyegan always finds the first P100 and runs the workers there... the training quickly fails with a "not enough GPU memory" problem because the P100 was selected.
at Batch=16 I have my python worker GPU running around 14.5 GB and the other GPUs around 10.5 GB. Obviously that python worker GPU is temporarily is spiking above 16GB causing the memory failure, but if I could control the python worker GPU selection, then perhaps I could train at Batch=32 (by getting Stylegan to target my python workers on my P40 24GB).
I have also purchased another 8 GPU system, but I'm thinking that if I can't use any more that 10.5 GB on the other 7GPUs, I might as well just buy 7 12GB P100s and try swapping in a single 16GB P100 and see if that configuration will work. I know that theory didnt work with the P40 swap but I'm hoping that introducing the exact same GPU model might have different results.
Does any one know if I can "control the python worker GPU selection" within Stylegan2? This might solve my problem on both systems and save some other people some considerable GPU investment by running stylegan with one "Master" GPU with more memory and some other "Worker GPU's" with 4GB less memory.
Then there is the longer term question... why stylegan does not distribute the python worker processes across all GPUs... not everyone has an 8GPU system with A100s or V100s and 24-32+GB memory. (Perhaps one dedicated GPU needs to consolidate the results which were run across the other GPUs... which is understandable?)
The text was updated successfully, but these errors were encountered:
I'm working with an 8 GPU system with P100 16GBs on a Ubuntu 18 LTS OS.
I have P100 16GB GPUs but I can only run Stylegan2 with BATCH=16. This is because one of the 8 GPUs consumes too much memory (the GPU running the 8 python workers) with BATCH=24 or default.
I have swapped in some P40s with 24GB and everything runs fine using the mixed GPUs (I assume because they are all Pascal), but Stylegan always selects one of the P100 16GB systems for running the 8 python workers. I have tried moving the 24GB P40s around... position 1, position 2, position 1&2. Stlyegan always finds the first P100 and runs the workers there... the training quickly fails with a "not enough GPU memory" problem because the P100 was selected.
at Batch=16 I have my python worker GPU running around 14.5 GB and the other GPUs around 10.5 GB. Obviously that python worker GPU is temporarily is spiking above 16GB causing the memory failure, but if I could control the python worker GPU selection, then perhaps I could train at Batch=32 (by getting Stylegan to target my python workers on my P40 24GB).
I have also purchased another 8 GPU system, but I'm thinking that if I can't use any more that 10.5 GB on the other 7GPUs, I might as well just buy 7 12GB P100s and try swapping in a single 16GB P100 and see if that configuration will work. I know that theory didnt work with the P40 swap but I'm hoping that introducing the exact same GPU model might have different results.
Does any one know if I can "control the python worker GPU selection" within Stylegan2? This might solve my problem on both systems and save some other people some considerable GPU investment by running stylegan with one "Master" GPU with more memory and some other "Worker GPU's" with 4GB less memory.
Then there is the longer term question... why stylegan does not distribute the python worker processes across all GPUs... not everyone has an 8GPU system with A100s or V100s and 24-32+GB memory. (Perhaps one dedicated GPU needs to consolidate the results which were run across the other GPUs... which is understandable?)
The text was updated successfully, but these errors were encountered: