Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory #10

Open
xziyh opened this issue Dec 6, 2021 · 3 comments
Open

out of memory #10

xziyh opened this issue Dec 6, 2021 · 3 comments

Comments

@xziyh
Copy link

xziyh commented Dec 6, 2021

hello author,
i use the 3080ti to train Conditional DETR with entire coco2017 datasets.
But the programs report that cuda out of memory,3080ti has 12GB memory.I use the msi after burner to monitor the memory usage,and it shows the biggest memory usage is only 2520MB
I set the batchsize to 1.
image

@DeppMeng
Copy link
Collaborator

DeppMeng commented Dec 7, 2021

Hi,

Can you give us the exact training script, including all the training arguments? So that we can check whether the memory is not enough indeed or there are some other reasons.

@xziyh
Copy link
Author

xziyh commented Dec 7, 2021

Hi,
Thanks for ur reply,there are some arguments:
lr=0.0001, lr_backbone=1e-05, batch_size=1, weight_decay=0.0001, epochs=1, lr_drop=40, clip_max_norm=0.1, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', enc_layers=6, dec_layers=6, dim_feedforward=2048, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, pre_norm=False, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='coco', coco_panoptic_path=None, remove_difficult=False, output_dir='results', device='cuda', seed=42, resume='', start_epoch=0, eval=False, num_workers=2, world_size=1, dist_url='env://', distributed=False)

@DeppMeng
Copy link
Collaborator

DeppMeng commented Dec 7, 2021

It is strange. From your arguments, you use resnet50 backbone without dilation. In this setting, 12GB memory should be well enough for batch_size 1. I do not have a clue. Maybe try to restart your computer to make sure all background programs that might comsume GPU memory are killed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants