-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The memeroy-consumption has not decreased #6
Comments
Hello, I wonder how many batch size do you use? Can you explicitly report the memory usage before and after DisCo-CLIP? DisCo-CLIP can save more memory when the batch size is large. If you only use small batch_size, it can only save a little memory. |
Hi, we have fixed the bug in the 'test.py' script related to memory usage errors. (it doesn't exist in our experiments.) You can retest it after pulling the latest code. |
Hi, Thanks for your impressive and inspiring work. I try to reproduce your work on 8 A100 40G with the Vit-B/32-based CLIP and increase the batch size from 512 to 1024. But I met the cuda OOM err which is different from the conclusion mentioned in the abstract - "DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs." Is there something I'm missing during the code reproduction phase? |
Thanks very much for your question. There is an error in the test script. It has been fixed. Please try again using the new code. Let us know if your problem is fixed or not. Thanks. |
Thanks for your question. In the backbone part, we use FP16 and checkpointing as OpenCLIP. Otherwise, the backbone part will cost too much memory. You should use FP16 and checkpointing and try it again. |
Thanks for your response. I'll try it! |
Hi~ I have followed your suggestions to use FP16 and checkpointing to improve my code. Considering Disco-CLIP reduces memory during the loss calculation phase, so I try to directly give up features gathering and only calculate the contrastive loss in the local batch. However, the memory is still not enough, when setting the batch size to 1024 per GPU. How could I improve my code to reproduce your work? Could you post your entire training code to help me check my mistakes? |
I ran test. py on an 8-A100 GPU machine and found that these three methods consume the same amount of memory. Why?
The text was updated successfully, but these errors were encountered: