Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed num_workers #229

Merged
merged 2 commits into from
Jun 19, 2024
Merged

fixed num_workers #229

merged 2 commits into from
Jun 19, 2024

Conversation

d-kleine
Copy link
Contributor

@d-kleine d-kleine commented Jun 19, 2024

  • removed num_workers in ch04/01 for GPTDatasetV1(Dataset) and added to DataLoader()
  • changed num_workers=0 in DataLoader() to num_workers=num_workers as defined in create_dataloader_v1()
  • added num_workers to create_dataloader_v1 to ch06 and ch07

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@d-kleine
Copy link
Contributor Author

d-kleine commented Jun 19, 2024

Please double check that everything is fine 🙂

@d-kleine d-kleine marked this pull request as ready for review June 19, 2024 14:08
@rasbt
Copy link
Owner

rasbt commented Jun 19, 2024

Oh thanks, this wasn't supposed to be hardcoded, otherwise it would defeat the purpose of the function argument!

@rasbt
Copy link
Owner

rasbt commented Jun 19, 2024

Btw, in case you tried, how is the data loading with multiple workers for you on Windows? I remember a few years ago there were issues with that since Windows had some limitations with multiprocessing in Python. Not sure if something has changed since then.

@d-kleine
Copy link
Contributor Author

Haven't tested it yet, but just tried. With the 124M model, I can see no big difference:
grafik

@rasbt
Copy link
Owner

rasbt commented Jun 19, 2024

Oh yeah, you might only notice if you train on GPU. But text loading is so cheap, especially because we pretokenize, that there will probably be only a very very small difference. Thanks for checking though, seems like it generally works now on Windows without crashing :)

@d-kleine
Copy link
Contributor Author

d-kleine commented Jun 19, 2024

and here for GPT-2 XL:
(edit: updated):
grafik

@rasbt
Copy link
Owner

rasbt commented Jun 19, 2024

I think the bottom one uses the small model still. It's also more the training time that would be affected (not the creation of the data loaders).

@d-kleine
Copy link
Contributor Author

Thanks, screenshot updated above ⬆️

BTW I just see as that the dropout for the gpt-2 models changes between 0 and 0.1 over the chapters, is this intended?
grafik

@rasbt
Copy link
Owner

rasbt commented Jun 19, 2024

BTW I just see as that the dropout for the gpt-2 models changes between 0 and 0.1 over the chapters, is this intended

Yes, in chapter 4 we use dropout 0.1 because that's the original setting OpenAI used in the GPT-2 paper. But nowadays it's not necessary/recommended to use dropout, so I am not using it during finetuning.

@d-kleine
Copy link
Contributor Author

d-kleine commented Jun 19, 2024

Please check ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py. It's not technically relevant, but shouldn't the dropout be the same for the small debugging model as for the standard model ("drop_rate": 0.1)?

(...) so I am not using it during finetuning.

Ah, I see, thanks! I think it would be good to add this info in the text or at least to the code as a code comment where you do "drop_rate": 0.0, adding a comment like

BASE_CONFIG = {
    ...
    "drop_rate": 0.0, # deactivated as dropout in LLMs is not recommended anymore
    ...

@d-kleine
Copy link
Contributor Author

d-kleine commented Jun 19, 2024

I just have added num_workers to create_dataloader_v1 to ch06 and ch07 too as it makes the previous_chapters.py files more consistent

About the PR, is everything fine so far about the changes?

@rasbt
Copy link
Owner

rasbt commented Jun 19, 2024

About the PR, is everything fine so far about the changes?

Yes, this looks awesome, many thanks!

@rasbt rasbt merged commit bbb2a0c into rasbt:main Jun 19, 2024
5 checks passed
@d-kleine
Copy link
Contributor Author

Have now tested the num_workers in the training phase as well, only small, negligible improvements with same batch size

@rasbt
Copy link
Owner

rasbt commented Jun 20, 2024

Interesting. Yeah, I think that's the same what I originally observed on Linux + GPU (and also macOS on CPU). I hypothesize because the data is pretokenized, loading is quick no matter what. Thanks for looking into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants