-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretraining (with CPUs) #660
Comments
I see in train_gpt2.cu there is a gpt2_build_from_random() for training from scratch. I can attempt to copy that into the train_gpt2.c, but not sure how easy it will be. Any forks doing this? What I would like to see is code that is platform independent (no reliance on Nvidia or AMD), though if people have those devices (or ASICs) they can use optimized code, but there should be a fallback to the platform independent code. Edit: I think this will do mainly what I want. Though I need to add a way to pass the model and training parameters to the command line: bitmarkcc@bdff450 |
Hey @bitmarkcc! Did you follow the README? You should first run the Python code, it'll generate all the necessary bin/state files before you run C/CUDA code. If something is not clearly explained in the README either open up a PR fixing it or reply back here, happy to help. |
Ya so according to the README, these can be generated with the train_gpt2.py and it references the official implementations of GPT-2 from OpenAI and HuggingFace. So these were generated from that python script? And if you run the C program it reproduces the same bin files? In any case, I am still wondering if my code is fine for how I implemented pretraining for CPU mode (bitmarkcc@bdff450). I want to make more changes and I can put a pull request later on. Edit: I think now it actually randomizes the parameters (2nd commit): bitmarkcc@7581695 |
nebody know where to get nice SHA? |
@gordicaleksa |
I'm new to deep learning but have some experience with training boosted-decision-trees.
Is this just for fine-tuning or pretraining as well? When I look inside train_gpt2.c I see the first thing it does is it loads weights from a bin file (gpt2_124M.bin). Where did this bin file come from? Is this an official file released by OpenAI? I would like to be able to start from scratch.
I would like to first see how pretraining works, even if it's just a small dataset, and it doesn't need to be GPUs. I would like to start with CPUs first, and maybe add CPU-only nodes that can work on 'parts' of the training.
The text was updated successfully, but these errors were encountered: