-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed num_workers #229
fixed num_workers #229
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Please double check that everything is fine 🙂 |
Oh thanks, this wasn't supposed to be hardcoded, otherwise it would defeat the purpose of the function argument! |
Btw, in case you tried, how is the data loading with multiple workers for you on Windows? I remember a few years ago there were issues with that since Windows had some limitations with multiprocessing in Python. Not sure if something has changed since then. |
Oh yeah, you might only notice if you train on GPU. But text loading is so cheap, especially because we pretokenize, that there will probably be only a very very small difference. Thanks for checking though, seems like it generally works now on Windows without crashing :) |
I think the bottom one uses the small model still. It's also more the training time that would be affected (not the creation of the data loaders). |
Yes, in chapter 4 we use dropout 0.1 because that's the original setting OpenAI used in the GPT-2 paper. But nowadays it's not necessary/recommended to use dropout, so I am not using it during finetuning. |
Please check ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py. It's not technically relevant, but shouldn't the dropout be the same for the small debugging model as for the standard model (
Ah, I see, thanks! I think it would be good to add this info in the text or at least to the code as a code comment where you do
|
I just have added About the PR, is everything fine so far about the changes? |
Yes, this looks awesome, many thanks! |
Have now tested the |
Interesting. Yeah, I think that's the same what I originally observed on Linux + GPU (and also macOS on CPU). I hypothesize because the data is pretokenized, loading is quick no matter what. Thanks for looking into it! |
num_workers
in ch04/01 forGPTDatasetV1(Dataset)
and added toDataLoader()
num_workers=0
inDataLoader()
tonum_workers=num_workers
as defined increate_dataloader_v1()
num_workers
tocreate_dataloader_v1
to ch06 and ch07