You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for sharing the codebase for your work. I am trying to train the network on custom data, but I got the following error:
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.
0%| | 1/15356 [00:01<5:05:31, 1.19s/it]
Traceback (most recent call last):
File "/media/home/C/NgeNet/train.py", line 224, in
main()
File "/media/home/C/NgeNet/train.py", line 124, in main
loss_dict = model_loss(coords_src=coords_src,
File "/home/anaconda3/envs/Negnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/home/C/NgeNet/losses/loss.py", line 130, in forward
overlap_loss_v = 0.5 * self.overlap_loss(ol_scores_src, ol_gt_src) +
File "/media/home/C/NgeNet/losses/loss.py", line 65, in overlap_loss
weights[ol_gt > 0.5] = 1 - ratio
RuntimeError: CUDA error: device-side assert triggered
Process finished with exit code 1
It seems that during the network training, the weight parameter is too large, cause the variables ‘q_feats_local’ is too large, cause the leaky_relu to be nan, which causes an error in the back propagation of loss. Have you ever encountered this situation during the experiment?
The text was updated successfully, but these errors were encountered:
Hi, Thanks for sharing the codebase for your work. I am trying to train the network on custom data, but I got the following error:
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [46,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [47,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [48,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [49,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [50,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [51,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [52,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [53,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [54,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [55,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [56,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [57,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [58,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [59,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [60,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [61,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [62,0,0] Assertion
input_val >= zero && input_val <= one
failed./pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [63,0,0] Assertion
input_val >= zero && input_val <= one
failed.0%| | 1/15356 [00:01<5:05:31, 1.19s/it]
Traceback (most recent call last):
File "/media/home/C/NgeNet/train.py", line 224, in
main()
File "/media/home/C/NgeNet/train.py", line 124, in main
loss_dict = model_loss(coords_src=coords_src,
File "/home/anaconda3/envs/Negnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/home/C/NgeNet/losses/loss.py", line 130, in forward
overlap_loss_v = 0.5 * self.overlap_loss(ol_scores_src, ol_gt_src) +
File "/media/home/C/NgeNet/losses/loss.py", line 65, in overlap_loss
weights[ol_gt > 0.5] = 1 - ratio
RuntimeError: CUDA error: device-side assert triggered
Process finished with exit code 1
It seems that during the network training, the weight parameter is too large, cause the variables ‘q_feats_local’ is too large, cause the leaky_relu to be nan, which causes an error in the back propagation of loss. Have you ever encountered this situation during the experiment?
The text was updated successfully, but these errors were encountered: