colossalai-training-code

This is the training code we used for the first prototype models.

Notably, it's based on an old version of HuggingFace's run_clm.py example, which was then adapted by the ColossalAI developers to make use of some their optimizations. It was then slightly improved to be usable in real-world scenarios (Tensorboard support, proper checkpointing, etc.).

Usage

This is being committed for archiving purposes, but if you'd like to use it, it probably works. The TL;DR version is:

Get all the dependencies installed.
- I have not documented this properly, but installing transformers and colossalai should probably cover it.
Put your data in a file called ./data/train.json.
- It should be a file where each line is a JSON object containing a text field, which contains the actual text that will be tokenized and fed into the model in the training loop.
Adjust any relevant config parameters in finetune.bash and run it. If you're lucky, the training loop will eventually start!
- Metrics should be logged to a runs folder inside the OUTPUT_DIR you've specified, so you can host a Tensorboard server there to watch them.
When it's done, you'll probably want to get a proper HF model out of the training checkpoints. You can do that using the provided convert_to_hf.py utility script.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
finetune.bash		finetune.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

colossalai-training-code

Usage

About

Releases

Packages

Languages

License

PygmalionAI/colossalai-training-code

Folders and files

Latest commit

History

Repository files navigation

colossalai-training-code

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages