The memeroy-consumption has not decreased #6

chen-hhaa · 2023-08-01T11:22:33Z

I ran test. py on an 8-A100 GPU machine and found that these three methods consume the same amount of memory. Why？

delveintodetail · 2023-08-02T03:41:23Z

Hello, I wonder how many batch size do you use? Can you explicitly report the memory usage before and after DisCo-CLIP? DisCo-CLIP can save more memory when the batch size is large. If you only use small batch_size, it can only save a little memory.

cyh1112 · 2023-08-02T08:14:24Z

Hi, we have fixed the bug in the 'test.py' script related to memory usage errors. (it doesn't exist in our experiments.) You can retest it after pulling the latest code.

Nku-cs-dsc · 2023-08-02T08:55:59Z

Hi, Thanks for your impressive and inspiring work. I try to reproduce your work on 8 A100 40G with the Vit-B/32-based CLIP and increase the batch size from 512 to 1024. But I met the cuda OOM err which is different from the conclusion mentioned in the abstract - "DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs." Is there something I'm missing during the code reproduction phase?

delveintodetail · 2023-08-02T09:03:53Z

Thanks very much for your question. There is an error in the test script. It has been fixed. Please try again using the new code. Let us know if your problem is fixed or not. Thanks.

delveintodetail · 2023-08-02T09:14:20Z

Hi, Thanks for your impressive and inspiring work. I try to reproduce your work on 8 A100 40G with the Vit-B/32-based CLIP and increase the batch size from 512 to 1024. But I met the cuda OOM err which is different from the conclusion mentioned in the abstract - "DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs." Is there something I'm missing during the code reproduction phase?

Thanks for your question. In the backbone part, we use FP16 and checkpointing as OpenCLIP. Otherwise, the backbone part will cost too much memory. You should use FP16 and checkpointing and try it again.

Nku-cs-dsc · 2023-08-02T09:15:32Z

Thanks for your response. I'll try it!

Nku-cs-dsc · 2023-08-03T13:12:54Z

Hi~ I have followed your suggestions to use FP16 and checkpointing to improve my code. Considering Disco-CLIP reduces memory during the loss calculation phase, so I try to directly give up features gathering and only calculate the contrastive loss in the local batch. However, the memory is still not enough, when setting the batch size to 1024 per GPU. How could I improve my code to reproduce your work? Could you post your entire training code to help me check my mistakes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The memeroy-consumption has not decreased #6

The memeroy-consumption has not decreased #6

chen-hhaa commented Aug 1, 2023

delveintodetail commented Aug 2, 2023

cyh1112 commented Aug 2, 2023

Nku-cs-dsc commented Aug 2, 2023

delveintodetail commented Aug 2, 2023

delveintodetail commented Aug 2, 2023

Nku-cs-dsc commented Aug 2, 2023

Nku-cs-dsc commented Aug 3, 2023

The memeroy-consumption has not decreased #6

The memeroy-consumption has not decreased #6

Comments

chen-hhaa commented Aug 1, 2023

delveintodetail commented Aug 2, 2023

cyh1112 commented Aug 2, 2023

Nku-cs-dsc commented Aug 2, 2023

delveintodetail commented Aug 2, 2023

delveintodetail commented Aug 2, 2023

Nku-cs-dsc commented Aug 2, 2023

Nku-cs-dsc commented Aug 3, 2023