Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm glad you guys did a great job. When I run bash exp.sh, it doesn't start training. It reported the following error. #2

Open
miaomiaocoder opened this issue Nov 10, 2023 · 3 comments

Comments

@miaomiaocoder
Copy link

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:231: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  input_ids = torch.tensor(sent[:,0,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:232: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  input_mask = torch.tensor(sent[:,1,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:233: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  segment_ids = torch.tensor(sent[:,2,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:235: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  decoder_input_ids = torch.tensor(title[:,0,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:236: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  decoder_input_mask = torch.tensor(title[:,1,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:237: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  decoder_segment_ids = torch.tensor(title[:,2,:].clone().detach(),dtype=torch.long).cuda()
[unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused215] [unused193] [unused193] [unused193] [unused193] [unused215] [unused193] [unused193] [unused193]

Then the program stopped running.

@miaomiaocoder
Copy link
Author

        # input_ids = torch.tensor(sent[:,0,:].clone().detach(),dtype=torch.long).cuda()
        # input_mask = torch.tensor(sent[:,1,:].clone().detach(),dtype=torch.long).cuda()
        # segment_ids = torch.tensor(sent[:,2,:].clone().detach(),dtype=torch.long).cuda()

        # decoder_input_ids = torch.tensor(title[:,0,:].clone().detach(),dtype=torch.long).cuda()
        # decoder_input_mask = torch.tensor(title[:,1,:].clone().detach(),dtype=torch.long).cuda()
        # decoder_segment_ids = torch.tensor(title[:,2,:].clone().detach(),dtype=torch.long).cuda()

        input_ids = sent[:, 0, :].clone().detach().to(torch.long).cuda()
        input_mask = sent[:, 1, :].clone().detach().to(torch.long).cuda()
        segment_ids = sent[:, 2, :].clone().detach().to(torch.long).cuda()

        decoder_input_ids = title[:, 0, :].clone().detach().to(torch.long).cuda()
        decoder_input_mask = title[:, 1, :].clone().detach().to(torch.long).cuda()
        decoder_segment_ids = title[:, 2, :].clone().detach().to(torch.long).cuda()

When I tried to remove the warning, the program still stopped running.

Start to load Faster-RCNN detected objects from data/hm_vgattr5050.tsv
Loaded 20 images in file data/hm_vgattr5050.tsv in 0 seconds.
Use 20 data in torch dataset

Some weights of BertO were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['bert.img_embedding.weight', 'bert.img_embedding.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
UNEXPECTED:  []
MISSING:  ['bert.img_embedding.weight', 'bert.img_embedding.bias']
ERRORS:  []
Some weights of GPT2LMHeadModel were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.crossattention.bias', 'h.0.crossattention.masked_bias', 'h.0.crossattention.c_attn.weight', 'h.0.crossattention.c_attn.bias', 'h.0.crossattention.q_attn.weight', 'h.0.crossattention.q_attn.bias', 'h.0.crossattention.c_proj.weight', 'h.0.crossattention.c_proj.bias', 'h.0.ln_cross_attn.weight', 'h.0.ln_cross_attn.bias', 'h.1.crossattention.bias', 'h.1.crossattention.masked_bias', 'h.1.crossattention.c_attn.weight', 'h.1.crossattention.c_attn.bias', 'h.1.crossattention.q_attn.weight', 'h.1.crossattention.q_attn.bias', 'h.1.crossattention.c_proj.weight', 'h.1.crossattention.c_proj.bias', 'h.1.ln_cross_attn.weight', 'h.1.ln_cross_attn.bias', 'h.2.crossattention.bias', 'h.2.crossattention.masked_bias', 'h.2.crossattention.c_attn.weight', 'h.2.crossattention.c_attn.bias', 'h.2.crossattention.q_attn.weight', 'h.2.crossattention.q_attn.bias', 'h.2.crossattention.c_proj.weight', 'h.2.crossattention.c_proj.bias', 'h.2.ln_cross_attn.weight', 'h.2.ln_cross_attn.bias', 'h.3.crossattention.bias', 'h.3.crossattention.masked_bias', 'h.3.crossattention.c_attn.weight', 'h.3.crossattention.c_attn.bias', 'h.3.crossattention.q_attn.weight', 'h.3.crossattention.q_attn.bias', 'h.3.crossattention.c_proj.weight', 'h.3.crossattention.c_proj.bias', 'h.3.ln_cross_attn.weight', 'h.3.ln_cross_attn.bias', 'h.4.crossattention.bias', 'h.4.crossattention.masked_bias', 'h.4.crossattention.c_attn.weight', 'h.4.crossattention.c_attn.bias', 'h.4.crossattention.q_attn.weight', 'h.4.crossattention.q_attn.bias', 'h.4.crossattention.c_proj.weight', 'h.4.crossattention.c_proj.bias', 'h.4.ln_cross_attn.weight', 'h.4.ln_cross_attn.bias', 'h.5.crossattention.bias', 'h.5.crossattention.masked_bias', 'h.5.crossattention.c_attn.weight', 'h.5.crossattention.c_attn.bias', 'h.5.crossattention.q_attn.weight', 'h.5.crossattention.q_attn.bias', 'h.5.crossattention.c_proj.weight', 'h.5.crossattention.c_proj.bias', 'h.5.ln_cross_attn.weight', 'h.5.ln_cross_attn.bias', 'h.6.crossattention.bias', 'h.6.crossattention.masked_bias', 'h.6.crossattention.c_attn.weight', 'h.6.crossattention.c_attn.bias', 'h.6.crossattention.q_attn.weight', 'h.6.crossattention.q_attn.bias', 'h.6.crossattention.c_proj.weight', 'h.6.crossattention.c_proj.bias', 'h.6.ln_cross_attn.weight', 'h.6.ln_cross_attn.bias', 'h.7.crossattention.bias', 'h.7.crossattention.masked_bias', 'h.7.crossattention.c_attn.weight', 'h.7.crossattention.c_attn.bias', 'h.7.crossattention.q_attn.weight', 'h.7.crossattention.q_attn.bias', 'h.7.crossattention.c_proj.weight', 'h.7.crossattention.c_proj.bias', 'h.7.ln_cross_attn.weight', 'h.7.ln_cross_attn.bias', 'h.8.crossattention.bias', 'h.8.crossattention.masked_bias', 'h.8.crossattention.c_attn.weight', 'h.8.crossattention.c_attn.bias', 'h.8.crossattention.q_attn.weight', 'h.8.crossattention.q_attn.bias', 'h.8.crossattention.c_proj.weight', 'h.8.crossattention.c_proj.bias', 'h.8.ln_cross_attn.weight', 'h.8.ln_cross_attn.bias', 'h.9.crossattention.bias', 'h.9.crossattention.masked_bias', 'h.9.crossattention.c_attn.weight', 'h.9.crossattention.c_attn.bias', 'h.9.crossattention.q_attn.weight', 'h.9.crossattention.q_attn.bias', 'h.9.crossattention.c_proj.weight', 'h.9.crossattention.c_proj.bias', 'h.9.ln_cross_attn.weight', 'h.9.ln_cross_attn.bias', 'h.10.crossattention.bias', 'h.10.crossattention.masked_bias', 'h.10.crossattention.c_attn.weight', 'h.10.crossattention.c_attn.bias', 'h.10.crossattention.q_attn.weight', 'h.10.crossattention.q_attn.bias', 'h.10.crossattention.c_proj.weight', 'h.10.crossattention.c_proj.bias', 'h.10.ln_cross_attn.weight', 'h.10.ln_cross_attn.bias', 'h.11.crossattention.bias', 'h.11.crossattention.masked_bias', 'h.11.crossattention.c_attn.weight', 'h.11.crossattention.c_attn.bias', 'h.11.crossattention.q_attn.weight', 'h.11.crossattention.q_attn.bias', 'h.11.crossattention.c_proj.weight', 'h.11.crossattention.c_proj.bias', 'h.11.ln_cross_attn.weight', 'h.11.ln_cross_attn.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[unused193] [unused193] [unused82] [unused193] [unused193] [unused193] [unused352] [unused193] [unused193] [unused193] [unused193] [unused215] [unused352] [unused92] [unused82] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused193] [unused352] [unused193] [unused193] [unused352] [unused193] [unused193] [unused193] [unused193]

Look forward to your help.

@darthgera123
Copy link
Owner

Hi, thanks for the use, I think there seems to be version mismatch of some kind as it looks like the weights are not being loaded properly. You can adapt the same with the newer versions of the models

@miaomiaocoder
Copy link
Author

Sorry, I still have this problem. Could you tell me the transformers version you used? I keep making this mistake. The environment I used was transformers=3.5. Thank you very much for your help!

(vilio) root@autodl-container-30da119afa-1465d5de:~/autodl-tmp/sum/Multimodal-Summarization# python experiment.py --seed 42 --model O --train train --valid dev_seen --test dev_seen --lr 1e-5 --batchSize 4 --tr bert-base-uncased --epochs 5 --tsv --num_features 36  --contrib --exp O36 --topk 20
Start to load Faster-RCNN detected objects from data/hm_vgattr3636.tsv
Loaded 20 images in file data/hm_vgattr3636.tsv in 0 seconds.
Use 20 data in torch dataset

Some weights of BertO were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['bert.img_embedding.weight', 'bert.img_embedding.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
UNEXPECTED:  []
MISSING:  ['bert.img_embedding.weight', 'bert.img_embedding.bias']
ERRORS:  []
Some weights of GPT2LMHeadModel were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.crossattention.bias', 'h.0.crossattention.masked_bias', 'h.0.crossattention.c_attn.weight', 'h.0.crossattention.c_attn.bias', 'h.0.crossattention.q_attn.weight', 'h.0.crossattention.q_attn.bias', 'h.0.crossattention.c_proj.weight', 'h.0.crossattention.c_proj.bias', 'h.0.ln_cross_attn.weight', 'h.0.ln_cross_attn.bias', 'h.1.crossattention.bias', 'h.1.crossattention.masked_bias', 'h.1.crossattention.c_attn.weight', 'h.1.crossattention.c_attn.bias', 'h.1.crossattention.q_attn.weight', 'h.1.crossattention.q_attn.bias', 'h.1.crossattention.c_proj.weight', 'h.1.crossattention.c_proj.bias', 'h.1.ln_cross_attn.weight', 'h.1.ln_cross_attn.bias', 'h.2.crossattention.bias', 'h.2.crossattention.masked_bias', 'h.2.crossattention.c_attn.weight', 'h.2.crossattention.c_attn.bias', 'h.2.crossattention.q_attn.weight', 'h.2.crossattention.q_attn.bias', 'h.2.crossattention.c_proj.weight', 'h.2.crossattention.c_proj.bias', 'h.2.ln_cross_attn.weight', 'h.2.ln_cross_attn.bias', 'h.3.crossattention.bias', 'h.3.crossattention.masked_bias', 'h.3.crossattention.c_attn.weight', 'h.3.crossattention.c_attn.bias', 'h.3.crossattention.q_attn.weight', 'h.3.crossattention.q_attn.bias', 'h.3.crossattention.c_proj.weight', 'h.3.crossattention.c_proj.bias', 'h.3.ln_cross_attn.weight', 'h.3.ln_cross_attn.bias', 'h.4.crossattention.bias', 'h.4.crossattention.masked_bias', 'h.4.crossattention.c_attn.weight', 'h.4.crossattention.c_attn.bias', 'h.4.crossattention.q_attn.weight', 'h.4.crossattention.q_attn.bias', 'h.4.crossattention.c_proj.weight', 'h.4.crossattention.c_proj.bias', 'h.4.ln_cross_attn.weight', 'h.4.ln_cross_attn.bias', 'h.5.crossattention.bias', 'h.5.crossattention.masked_bias', 'h.5.crossattention.c_attn.weight', 'h.5.crossattention.c_attn.bias', 'h.5.crossattention.q_attn.weight', 'h.5.crossattention.q_attn.bias', 'h.5.crossattention.c_proj.weight', 'h.5.crossattention.c_proj.bias', 'h.5.ln_cross_attn.weight', 'h.5.ln_cross_attn.bias', 'h.6.crossattention.bias', 'h.6.crossattention.masked_bias', 'h.6.crossattention.c_attn.weight', 'h.6.crossattention.c_attn.bias', 'h.6.crossattention.q_attn.weight', 'h.6.crossattention.q_attn.bias', 'h.6.crossattention.c_proj.weight', 'h.6.crossattention.c_proj.bias', 'h.6.ln_cross_attn.weight', 'h.6.ln_cross_attn.bias', 'h.7.crossattention.bias', 'h.7.crossattention.masked_bias', 'h.7.crossattention.c_attn.weight', 'h.7.crossattention.c_attn.bias', 'h.7.crossattention.q_attn.weight', 'h.7.crossattention.q_attn.bias', 'h.7.crossattention.c_proj.weight', 'h.7.crossattention.c_proj.bias', 'h.7.ln_cross_attn.weight', 'h.7.ln_cross_attn.bias', 'h.8.crossattention.bias', 'h.8.crossattention.masked_bias', 'h.8.crossattention.c_attn.weight', 'h.8.crossattention.c_attn.bias', 'h.8.crossattention.q_attn.weight', 'h.8.crossattention.q_attn.bias', 'h.8.crossattention.c_proj.weight', 'h.8.crossattention.c_proj.bias', 'h.8.ln_cross_attn.weight', 'h.8.ln_cross_attn.bias', 'h.9.crossattention.bias', 'h.9.crossattention.masked_bias', 'h.9.crossattention.c_attn.weight', 'h.9.crossattention.c_attn.bias', 'h.9.crossattention.q_attn.weight', 'h.9.crossattention.q_attn.bias', 'h.9.crossattention.c_proj.weight', 'h.9.crossattention.c_proj.bias', 'h.9.ln_cross_attn.weight', 'h.9.ln_cross_attn.bias', 'h.10.crossattention.bias', 'h.10.crossattention.masked_bias', 'h.10.crossattention.c_attn.weight', 'h.10.crossattention.c_attn.bias', 'h.10.crossattention.q_attn.weight', 'h.10.crossattention.q_attn.bias', 'h.10.crossattention.c_proj.weight', 'h.10.crossattention.c_proj.bias', 'h.10.ln_cross_attn.weight', 'h.10.ln_cross_attn.bias', 'h.11.crossattention.bias', 'h.11.crossattention.masked_bias', 'h.11.crossattention.c_attn.weight', 'h.11.crossattention.c_attn.bias', 'h.11.crossattention.q_attn.weight', 'h.11.crossattention.q_attn.bias', 'h.11.crossattention.c_proj.weight', 'h.11.crossattention.c_proj.bias', 'h.11.ln_cross_attn.weight', 'h.11.ln_cross_attn.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:231: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  input_ids = torch.tensor(sent[:,0,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:232: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  input_mask = torch.tensor(sent[:,1,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:233: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  segment_ids = torch.tensor(sent[:,2,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:235: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  decoder_input_ids = torch.tensor(title[:,0,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:236: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  decoder_input_mask = torch.tensor(title[:,1,:].clone().detach(),dtype=torch.long).cuda()
/root/autodl-tmp/sum/Multimodal-Summarization/oscar.py:237: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  decoder_segment_ids = torch.tensor(title[:,2,:].clone().detach(),dtype=torch.long).cuda()
[unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361] [unused361]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants