BERT(S) for Relation Extraction

Overview

A PyTorch implementation of the models for the paper "Matching the Blanks: Distributional Similarity for Relation Learning" published in ACL 2019.

Note: This is not an official repo for the paper.

Training by matching the blanks (BERT_EM + MTB)

Run pretraining.py with a YAML --conf_file containing the following arguments:

# Data
data: data/cnn.txt # pre-training data.txt file path
normalization: # How to normalize the MTB-pretraining corpus
  - lowercase # Apply lowercase
  - html # Strip HTML tags
  - urls # Remove URLs
# Model
transformer: bert-base-uncased # weight initialization (Should be huggingface BERT model)
# Training
batch_size: 32 # Training batch size
max_norm: 1.0 # Clipped gradient norm
epochs: 18 # Number of Epochs
lr: 0.0001 # learning rate
resume: False # Use this to resume the train job

Pre-training data can be any .txt continuous text file.
We use Spacy NLP to grab pairwise entities (within a window size of 40 tokens length) from the text to form relation statements for pre-training. Entities recognition are based on NER and dependency tree parsing of objects/subjects.

Fine-tuning on SemEval2010 Task 8 (BERT_EM/BERT_EM + MTB)

Run main_task.py with a YAML --conf_file containing the arguments below. Requires SemEval2010 Task 8 dataset, available here. Download & unzip to data/sem_eval folder.

# Data
train_file: data/sem_eval/SemEval2010_task8_training/TRAIN_FILE.TXT
test_file: data/sem_eval/SemEval2010_task8_testing_keys/TEST_FILE_FULL.TXT
# Model
pretrained_mtb_model: models/MTB-pretraining/MTB-pretraining-small/bert-base-uncased/best_model.pth.tar
transformer: bert-base-uncased
# Training
batch_size: 64
max_norm: 1.0
epochs: 25
lr: 0.00007
resume: False

pretrained_mtb_model can be None to use pretrained BERT coming from the transformers package.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.dvc		.dvc
configs		configs
dataloaders		dataloaders
model		model
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.dvc		data.dvc
finetuning_sem_eval.py		finetuning_sem_eval.py
helpers.py		helpers.py
logger.py		logger.py
models.dvc		models.dvc
preprocess.py		preprocess.py
pretrain.py		pretrain.py
pretrain.sh		pretrain.sh
requirements.txt		requirements.txt
results.dvc		results.dvc
run_on_sagemaker.py		run_on_sagemaker.py
sagemaker_requirements.txt		sagemaker_requirements.txt
sm_requirements.txt		sm_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT(S) for Relation Extraction

Overview

Training by matching the blanks (BERT_EM + MTB)

Fine-tuning on SemEval2010 Task 8 (BERT_EM/BERT_EM + MTB)

About

Releases

Packages

Languages

License

mcschmitz/BERT-Relation-Extraction

Folders and files

Latest commit

History

Repository files navigation

BERT(S) for Relation Extraction

Overview

Training by matching the blanks (BERTEM + MTB)

Fine-tuning on SemEval2010 Task 8 (BERTEM/BERTEM + MTB)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Training by matching the blanks (BERT_EM + MTB)

Fine-tuning on SemEval2010 Task 8 (BERT_EM/BERT_EM + MTB)

Packages