You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are working on custom corpus BERT pretraining. I followed the guide about data preparation (texts should be good now), however running the notebook gives the following error:
2 items cleaning up...
Cleanup took 0.0017843246459960938 seconds
06/28/2020 11:53:45 - INFO - __main__ - Exiting context: ProjectPythonPath
Traceback (most recent call last):
File "train.py", line 482, in <module>
eval_loss = train(index)
File "train.py", line 132, in train
batch = next(dataloaders[dataset_type])
File "train.py", line 47, in <genexpr>
return (x for x in DataLoader(dataset, batch_size=train_batch_size // 2 if eval_set else train_batch_size,
File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 144 and 128 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1307
which I don't understand perfectly in the current context.
We tried to run also with wiki en corpus data, still the same. Have tried with large-cased and multilingual-cased vocabs.
The text was updated successfully, but these errors were encountered:
Hello!
We are working on custom corpus BERT pretraining. I followed the guide about data preparation (texts should be good now), however running the notebook gives the following error:
which I don't understand perfectly in the current context.
We tried to run also with wiki en corpus data, still the same. Have tried with large-cased and multilingual-cased vocabs.
The text was updated successfully, but these errors were encountered: