This repository implement pretraining of Huggingface language models using MLM. This repository was created as part of a project for Digital Revisor which wanted to rework their ML pipeline to accomodate other languages than english. This repository was used to create a ELECTRA model for Dutch.
All parameters for training, models and datasets are set in config/config.yaml
.
Developers:
- Anders Jess Pedersen ([email protected])