-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training only on 2B tokens (openwebtext) #5
Comments
Thanks for your reply. The training configurations you referred to seem configured for the 600K training steps. I trained for 50K iterations; however, my val loss remained ~3 (PPL >30). |
No, i did not change anything such as learning rate or weight decay, I recall that my number is around those reported in the original nanogpt repo (https://github.com/karpathy/nanoGPT?tab=readme-ov-file#baselines). |
Hi !
Interesting work on the role of explicit bias!
I was wondering what training settings got you an eval PPL ~3.04. The paper mentions that 50K iterations are required to train the GPT-2 model on 2B tokens. What was the bacth_size_per_device and block_size for the same? Did you do training from scratch or fine-tune the pre-trained model (trained on 300B tokens)?
Thanks!
The text was updated successfully, but these errors were encountered: