Pretraining of language models

This repository implement pretraining of Huggingface language models using MLM. This repository was created as part of a project for Digital Revisor which wanted to rework their ML pipeline to accomodate other languages than english. This repository was used to create a ELECTRA model for Dutch.

All parameters for training, models and datasets are set in config/config.yaml.

Developers: