Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in collators.py line 164 #198

Open
kroll-software opened this issue Sep 21, 2024 · 2 comments
Open

KeyError in collators.py line 164 #198

kroll-software opened this issue Sep 21, 2024 · 2 comments
Labels
question Further information is requested stale Inactive since 30 days or more

Comments

@kroll-software
Copy link

Hi, I'm training on the huge bread-midi-dataset.

Enumerating the dataloader from a DatasetJSON throws a KeyError in collators.py
line 164: length_of_first = batch[0].size(0)
when the batch is empty (len(batch) == 0).

This error is very rare and happened after many hours of training during the 1st epoch.

I train with gradient accumulation and batch_size = 1.
This error only happens with batch_size = 1, all bigger batch_sizes would work.

I think, these very rare conditions of empty batches should be handled somewhere in DataCollator.call()

Hope it helps,

@Natooz
Copy link
Owner

Natooz commented Sep 27, 2024

Thank you for the report!

Maybe the DataCollator class can implement a method to return empty batches, that could be overridden by users to handle these use cases according to their needs?
Thing is, any model can handle empty batches differently, so there is no silver bullet here or solution that would fit all cases. Same thing for the trainer/training loop. If we encounter an empty batch, should the model still return a loss value?
As I see it, the best solution is probably to make the collator raise an error in case of empty batch, to force the user to curate its data and make sure it can be used to properly train a model/tokenizer.

@Natooz Natooz added the question Further information is requested label Sep 27, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Inactive since 30 days or more label Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale Inactive since 30 days or more
Projects
None yet
Development

No branches or pull requests

2 participants