Training only on 2B tokens (openwebtext) #5

Nandan91 · 2024-03-22T18:44:17Z

Hi !
Interesting work on the role of explicit bias!

I was wondering what training settings got you an eval PPL ~3.04. The paper mentions that 50K iterations are required to train the GPT-2 model on 2B tokens. What was the bacth_size_per_device and block_size for the same? Did you do training from scratch or fine-tune the pre-trained model (trained on 300B tokens)?

Thanks!

Eric-mingjie · 2024-03-22T19:55:57Z

Hi, Thanks for your interest in our work.

The training config is shown here, which i think will be automatically divided by the number of GPUs available (here).

We do not perform any fine-tuning but instead train all the GPT-2 models from scratch.

Nandan91 · 2024-05-08T19:29:17Z

Thanks for your reply.

The training configurations you referred to seem configured for the 600K training steps.
As mentioned in the paper, you ran for 50K iterations to train only on 2 B tokens (the eval PPL you got is 3). Did you change anything else, such as learning rate, weight decay, etc.?

I trained for 50K iterations; however, my val loss remained ~3 (PPL >30).

Eric-mingjie · 2024-05-09T02:54:23Z

No, i did not change anything such as learning rate or weight decay, I recall that my number is around those reported in the original nanogpt repo (https://github.com/karpathy/nanoGPT?tab=readme-ov-file#baselines).

Nandan91 closed this as completed May 8, 2024

Nandan91 reopened this May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training only on 2B tokens (openwebtext) #5

Training only on 2B tokens (openwebtext) #5

Nandan91 commented Mar 22, 2024 •

edited

Loading

Eric-mingjie commented Mar 22, 2024

Nandan91 commented May 8, 2024

Eric-mingjie commented May 9, 2024 •

edited

Loading

Training only on 2B tokens (openwebtext) #5

Training only on 2B tokens (openwebtext) #5

Comments

Nandan91 commented Mar 22, 2024 • edited Loading

Eric-mingjie commented Mar 22, 2024

Nandan91 commented May 8, 2024

Eric-mingjie commented May 9, 2024 • edited Loading

Nandan91 commented Mar 22, 2024 •

edited

Loading

Eric-mingjie commented May 9, 2024 •

edited

Loading