Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 290 Bytes

wt103.md

File metadata and controls

10 lines (9 loc) · 290 Bytes

Training on WikiText-103

# download raw dataset
python scripts/datasets/wikitext-103/download.py
# tokenize using GPT2 tokenizer
python tokenize_dataset.py configs/tokenize/wikitext103_tiktoken_gpt2.yaml
# train
python train.py configs/train_gpt2/wikitext103.yaml