fixed num_workers #229

d-kleine · 2024-06-19T14:02:23Z

removed num_workers in ch04/01 for GPTDatasetV1(Dataset) and added to DataLoader()
changed num_workers=0 in DataLoader() to num_workers=num_workers as defined in create_dataloader_v1()
added num_workers to create_dataloader_v1 to ch06 and ch07

review-notebook-app · 2024-06-19T14:02:28Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

d-kleine · 2024-06-19T14:03:39Z

Please double check that everything is fine 🙂

rasbt · 2024-06-19T14:09:02Z

Oh thanks, this wasn't supposed to be hardcoded, otherwise it would defeat the purpose of the function argument!

rasbt · 2024-06-19T14:11:23Z

Btw, in case you tried, how is the data loading with multiple workers for you on Windows? I remember a few years ago there were issues with that since Windows had some limitations with multiprocessing in Python. Not sure if something has changed since then.

d-kleine · 2024-06-19T14:20:55Z

Haven't tested it yet, but just tried. With the 124M model, I can see no big difference:

rasbt · 2024-06-19T14:29:39Z

Oh yeah, you might only notice if you train on GPU. But text loading is so cheap, especially because we pretokenize, that there will probably be only a very very small difference. Thanks for checking though, seems like it generally works now on Windows without crashing :)

d-kleine · 2024-06-19T14:29:39Z

and here for GPT-2 XL:
(edit: updated):

rasbt · 2024-06-19T14:30:52Z

I think the bottom one uses the small model still. It's also more the training time that would be affected (not the creation of the data loaders).

d-kleine · 2024-06-19T14:33:59Z

Thanks, screenshot updated above ⬆️

BTW I just see as that the dropout for the gpt-2 models changes between 0 and 0.1 over the chapters, is this intended?

rasbt · 2024-06-19T14:39:25Z

BTW I just see as that the dropout for the gpt-2 models changes between 0 and 0.1 over the chapters, is this intended

Yes, in chapter 4 we use dropout 0.1 because that's the original setting OpenAI used in the GPT-2 paper. But nowadays it's not necessary/recommended to use dropout, so I am not using it during finetuning.

d-kleine · 2024-06-19T15:14:17Z

Please check ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py. It's not technically relevant, but shouldn't the dropout be the same for the small debugging model as for the standard model ("drop_rate": 0.1)?

(...) so I am not using it during finetuning.

Ah, I see, thanks! I think it would be good to add this info in the text or at least to the code as a code comment where you do "drop_rate": 0.0, adding a comment like

BASE_CONFIG = {
    ...
    "drop_rate": 0.0, # deactivated as dropout in LLMs is not recommended anymore
    ...

d-kleine · 2024-06-19T15:25:39Z

I just have added num_workers to create_dataloader_v1 to ch06 and ch07 too as it makes the previous_chapters.py files more consistent

About the PR, is everything fine so far about the changes?

rasbt · 2024-06-19T22:36:42Z

About the PR, is everything fine so far about the changes?

Yes, this looks awesome, many thanks!

d-kleine · 2024-06-20T11:12:07Z

Have now tested the num_workers in the training phase as well, only small, negligible improvements with same batch size

rasbt · 2024-06-20T13:45:56Z

Interesting. Yeah, I think that's the same what I originally observed on Linux + GPU (and also macOS on CPU). I hypothesize because the data is pretokenized, loading is quick no matter what. Thanks for looking into it!

fixed num_workers

915ef43

d-kleine marked this pull request as ready for review June 19, 2024 14:08

rasbt approved these changes Jun 19, 2024

View reviewed changes

ch06 & ch07: added num_workers to create_dataloader_v1

5449ed0

rasbt approved these changes Jun 19, 2024

View reviewed changes

rasbt merged commit bbb2a0c into rasbt:main Jun 19, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed num_workers #229

fixed num_workers #229

d-kleine commented Jun 19, 2024 •

edited

Loading

review-notebook-app bot commented Jun 19, 2024

d-kleine commented Jun 19, 2024 •

edited

Loading

rasbt commented Jun 19, 2024

rasbt commented Jun 19, 2024

d-kleine commented Jun 19, 2024

rasbt commented Jun 19, 2024

d-kleine commented Jun 19, 2024 •

edited

Loading

rasbt commented Jun 19, 2024 •

edited

Loading

d-kleine commented Jun 19, 2024

rasbt commented Jun 19, 2024

d-kleine commented Jun 19, 2024 •

edited

Loading

d-kleine commented Jun 19, 2024 •

edited

Loading

rasbt commented Jun 19, 2024

d-kleine commented Jun 20, 2024

rasbt commented Jun 20, 2024

fixed num_workers #229

fixed num_workers #229

Conversation

d-kleine commented Jun 19, 2024 • edited Loading

review-notebook-app bot commented Jun 19, 2024

d-kleine commented Jun 19, 2024 • edited Loading

rasbt commented Jun 19, 2024

rasbt commented Jun 19, 2024

d-kleine commented Jun 19, 2024

rasbt commented Jun 19, 2024

d-kleine commented Jun 19, 2024 • edited Loading

rasbt commented Jun 19, 2024 • edited Loading

d-kleine commented Jun 19, 2024

rasbt commented Jun 19, 2024

d-kleine commented Jun 19, 2024 • edited Loading

d-kleine commented Jun 19, 2024 • edited Loading

rasbt commented Jun 19, 2024

d-kleine commented Jun 20, 2024

rasbt commented Jun 20, 2024

d-kleine commented Jun 19, 2024 •

edited

Loading

d-kleine commented Jun 19, 2024 •

edited

Loading

d-kleine commented Jun 19, 2024 •

edited

Loading

rasbt commented Jun 19, 2024 •

edited

Loading

d-kleine commented Jun 19, 2024 •

edited

Loading

d-kleine commented Jun 19, 2024 •

edited

Loading