You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enumerating the dataloader from a DatasetJSON throws a KeyError in collators.py
line 164: length_of_first = batch[0].size(0)
when the batch is empty (len(batch) == 0).
This error is very rare and happened after many hours of training during the 1st epoch.
I train with gradient accumulation and batch_size = 1.
This error only happens with batch_size = 1, all bigger batch_sizes would work.
I think, these very rare conditions of empty batches should be handled somewhere in DataCollator.call()
Hope it helps,
The text was updated successfully, but these errors were encountered:
Maybe the DataCollator class can implement a method to return empty batches, that could be overridden by users to handle these use cases according to their needs?
Thing is, any model can handle empty batches differently, so there is no silver bullet here or solution that would fit all cases. Same thing for the trainer/training loop. If we encounter an empty batch, should the model still return a loss value?
As I see it, the best solution is probably to make the collator raise an error in case of empty batch, to force the user to curate its data and make sure it can be used to properly train a model/tokenizer.
Hi, I'm training on the huge bread-midi-dataset.
Enumerating the dataloader from a DatasetJSON throws a KeyError in collators.py
line 164: length_of_first = batch[0].size(0)
when the batch is empty (len(batch) == 0).
This error is very rare and happened after many hours of training during the 1st epoch.
I train with gradient accumulation and batch_size = 1.
This error only happens with batch_size = 1, all bigger batch_sizes would work.
I think, these very rare conditions of empty batches should be handled somewhere in DataCollator.call()
Hope it helps,
The text was updated successfully, but these errors were encountered: