You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
to train RotatE on a 11 GB GPU. I ensure it is completely free.
I still get the following error:
2022-03-31 19:32:37,370 INFO negative_adversarial_sampling = False
2022-03-31 19:32:37,370 INFO learning_rate = 0
2022-03-31 19:32:39,079 INFO Training average positive_sample_loss at step 0: 5.635527
2022-03-31 19:32:39,079 INFO Training average negative_sample_loss at step 0: 0.003591
2022-03-31 19:32:39,079 INFO Training average loss at step 0: 2.819559
2022-03-31 19:32:39,079 INFO Evaluating on Valid Dataset...
2022-03-31 19:32:39,552 INFO Evaluating the model... (0/2192)
2022-03-31 19:33:38,650 INFO Evaluating the model... (1000/2192)
2022-03-31 19:34:38,503 INFO Evaluating the model... (2000/2192)
2022-03-31 19:34:49,981 INFO Valid MRR at step 0: 0.005509
2022-03-31 19:34:49,982 INFO Valid MR at step 0: 6894.798660
2022-03-31 19:34:49,982 INFO Valid HITS@1 at step 0: 0.004733
2022-03-31 19:34:49,982 INFO Valid HITS@3 at step 0: 0.005076
2022-03-31 19:34:49,982 INFO Valid HITS@10 at step 0: 0.005646
Traceback (most recent call last):
File "codes/run.py", line 371, in <module>
main(parse_args())
File "codes/run.py", line 315, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "/home/prachi/related_work/KnowledgeGraphEmbedding/codes/model.py", line 315, in train_step
loss.backward()
File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 7.41 GiB already allocated; 1.51 GiB free; 1.52 GiB cached)
run.sh: line 79:
CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \
--cuda \
--do_valid \
--do_test \
--data_path $FULL_DATA_PATH \
--model $MODEL \
-n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \
-g $GAMMA -a $ALPHA -adv \
-lr $LEARNING_RATE --max_steps $MAX_STEPS \
-save $SAVE --test_batch_size $TEST_BATCH_SIZE \
${14} ${15} ${16} ${17} ${18} ${19} ${20}
: No such file or directory
I get similar errors on trying to train FB15k using the command in best_config.sh file.
I reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.
I am not sure what is the issue.
The text was updated successfully, but these errors were encountered:
I use the command:
to train RotatE on a 11 GB GPU. I ensure it is completely free.
I still get the following error:
I get similar errors on trying to train FB15k using the command in best_config.sh file.
I reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.
I am not sure what is the issue.
The text was updated successfully, but these errors were encountered: