The question about the loss function #12

stepwhal · 2023-02-15T10:08:17Z

Hi, Thanks for sharing the codebase for your work. I am trying to train the network on custom data, but I got the following error：

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.
0%| | 1/15356 [00:01<5:05:31, 1.19s/it]
Traceback (most recent call last):
File "/media/home/C/NgeNet/train.py", line 224, in
main()
File "/media/home/C/NgeNet/train.py", line 124, in main
loss_dict = model_loss(coords_src=coords_src,
File "/home/anaconda3/envs/Negnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/home/C/NgeNet/losses/loss.py", line 130, in forward
overlap_loss_v = 0.5 * self.overlap_loss(ol_scores_src, ol_gt_src) +
File "/media/home/C/NgeNet/losses/loss.py", line 65, in overlap_loss
weights[ol_gt > 0.5] = 1 - ratio
RuntimeError: CUDA error: device-side assert triggered

Process finished with exit code 1

It seems that during the network training, the weight parameter is too large, cause the variables ‘q_feats_local’ is too large, cause the leaky_relu to be nan, which causes an error in the back propagation of loss. Have you ever encountered this situation during the experiment?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The question about the loss function #12

The question about the loss function #12

stepwhal commented Feb 15, 2023

The question about the loss function #12

The question about the loss function #12

Comments

stepwhal commented Feb 15, 2023