Skip to content

Latest commit

 

History

History
22 lines (12 loc) · 618 Bytes

README.md

File metadata and controls

22 lines (12 loc) · 618 Bytes

博客: https://blog.csdn.net/u014403221/article/details/135471423?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22135471423%22%2C%22source%22%3A%22u014403221%22%7D

gpt2-pretrain

预训练中文GPT2

train_tokenizer.py # 用于训练BPE tokenizer

train_sentencePiece_tokenizer.py # 用于训练sentencepiece tokenizer

tmp.py # 测试tokenizer

run_clm.py # 训练GPT2

process_data.py # 整合数据

inference.py # 推理

训练所使用的数据

https://download.csdn.net/download/u014403221/88755559 https://download.csdn.net/download/u014403221/88761912