language-modeling-cometh

This repo stores the work I participated in the URP, COMETH, at RPI.

preprocess corpus

Preprocess the corpus for fine-tuning

defineCorpus(publishers, passages, documents, output)

Use 10%, 10% and 80% propotions of the corpus for Test, Validation and Training, respectively to fine-tune the pretrained bert model

fineTuneEmbedding(publishers, output)

Combine the fine-tuned bert models with Flair Transformer

combineModelFlair(publishers, documents, ignores, output)

Use Flair embedding for its Transformer

saveEmbedding(embeddingVectors, output)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
analysis4embedding.py		analysis4embedding.py
bert2embedding.py		bert2embedding.py
requirements.txt		requirements.txt