Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The memeroy-consumption has not decreased #6

Open
chen-hhaa opened this issue Aug 1, 2023 · 7 comments
Open

The memeroy-consumption has not decreased #6

chen-hhaa opened this issue Aug 1, 2023 · 7 comments

Comments

@chen-hhaa
Copy link

I ran test. py on an 8-A100 GPU machine and found that these three methods consume the same amount of memory. Why?

@delveintodetail
Copy link
Collaborator

Hello, I wonder how many batch size do you use? Can you explicitly report the memory usage before and after DisCo-CLIP? DisCo-CLIP can save more memory when the batch size is large. If you only use small batch_size, it can only save a little memory.

@cyh1112
Copy link
Collaborator

cyh1112 commented Aug 2, 2023

Hi, we have fixed the bug in the 'test.py' script related to memory usage errors. (it doesn't exist in our experiments.) You can retest it after pulling the latest code.

@Nku-cs-dsc
Copy link

Hi, Thanks for your impressive and inspiring work. I try to reproduce your work on 8 A100 40G with the Vit-B/32-based CLIP and increase the batch size from 512 to 1024. But I met the cuda OOM err which is different from the conclusion mentioned in the abstract - "DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs." Is there something I'm missing during the code reproduction phase?

@delveintodetail
Copy link
Collaborator

Thanks very much for your question. There is an error in the test script. It has been fixed. Please try again using the new code. Let us know if your problem is fixed or not. Thanks.

@delveintodetail
Copy link
Collaborator

Hi, Thanks for your impressive and inspiring work. I try to reproduce your work on 8 A100 40G with the Vit-B/32-based CLIP and increase the batch size from 512 to 1024. But I met the cuda OOM err which is different from the conclusion mentioned in the abstract - "DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs." Is there something I'm missing during the code reproduction phase?

Thanks for your question. In the backbone part, we use FP16 and checkpointing as OpenCLIP. Otherwise, the backbone part will cost too much memory. You should use FP16 and checkpointing and try it again.

@Nku-cs-dsc
Copy link

Thanks for your response. I'll try it!

@Nku-cs-dsc
Copy link

Hi~ I have followed your suggestions to use FP16 and checkpointing to improve my code. Considering Disco-CLIP reduces memory during the loss calculation phase, so I try to directly give up features gathering and only calculate the contrastive loss in the local batch. However, the memory is still not enough, when setting the batch size to 1024 per GPU. How could I improve my code to reproduce your work? Could you post your entire training code to help me check my mistakes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants