训练HAN模型报错 #7

ZZKa · 2020-10-21T02:37:34Z

请问大佬测试过HAN模型吗？我训练的时候会报RuntimeError: CUDA error: device-side assert triggered，请问是什么原因呢？
请帮忙解答，非常感谢！

error log：
0it [00:00, ?it/s]Building prefix dict from the default dictionary ...
hierattnet
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model from cache /tmp/jieba.cache
Loading model cost 2.178 seconds.
Loading model cost 2.178 seconds.
Prefix dict has been built successfully.
Prefix dict has been built successfully.
50000it [06:22, 130.57it/s]
5000it [00:37, 132.38it/s]
10000it [01:20, 123.86it/s]HierAttNet(
(word_att_net): WordAttNet(
(dropout): Dropout(p=0.5, inplace=False)
(embedding): Embedding(144241, 300)
(rnn): GRU(300, 64, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
)
(sent_att_net): SentAttNet(
(rnn): GRU(128, 64, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
(fc): Linear(in_features=128, out_features=10, bias=True)
)
)
Trainable parameters: 398602
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block:
……
……
……
[14,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last):
File "train.py", line 134, in
run('configs/multi_classification/han_config.json')
File "train.py", line 105, in run
main(config, use_transformers=False)
File "train.py", line 80, in main
trainer.train()
File "/home/work/zzk/text_classification/base/base_trainer.py", line 67, in train
result = self._train_epoch(epoch)
File "/home/work/zzk/text_classification/trainer/trainer.py", line 52, in _train_epoch
output = self.model(input_token_ids,bert_masks, seq_lens).squeeze(1)
File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/work/zzk/text_classification/model/model.py", line 404, in forward
word_output, hidden = self.word_att_net(input_token_ids,seq_lens)
File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/work/zzk/text_classification/model/model.py", line 473, in forward
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, sorted_seq_lengths, batch_first=self.batch_first)
File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 234, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered

The text was updated successfully, but these errors were encountered:

ZZKa · 2020-10-21T03:04:04Z

看到更新了V2版本，HAN模型没了？

Lizhen0628 · 2020-10-22T07:45:50Z

@ZZKa
HAN 没有加入，等有时间会再更新加入。
从你报错内容来看，猜测应该是token id 超出了embedding size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练HAN模型报错 #7

训练HAN模型报错 #7

ZZKa commented Oct 21, 2020 •

edited

Loading

ZZKa commented Oct 21, 2020

Lizhen0628 commented Oct 22, 2020

训练HAN模型报错 #7

训练HAN模型报错 #7

Comments

ZZKa commented Oct 21, 2020 • edited Loading

ZZKa commented Oct 21, 2020

Lizhen0628 commented Oct 22, 2020

ZZKa commented Oct 21, 2020 •

edited

Loading