Code base to reproduce benchmarking experiments on BELB (Biomedical Entity Linking Benchmark) reported in:
@article{10.1093/bioinformatics/btad698,
author = {Garda, Samuele and Weber-Genzel, Leon and Martin, Robert and Leser, Ulf},
title = {{BELB}: a {B}iomedical {E}ntity {L}inking {B}enchmark},
journal = {Bioinformatics},
pages = {btad698},
year = {2023},
month = {11},
issn = {1367-4811},
doi = {10.1093/bioinformatics/btad698},
url = {https://doi.org/10.1093/bioinformatics/btad698},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btad698/53483107/btad698.pdf},
}
We assume you have a working installation of belb in your python environment:
git clone https://github.com/sg-wbi/belb
cd belb
pip install -e .
and other requirements:
(belb-venv) user $ pip install -r requirements.txt
There are two type of models: rule-based entity-specific and those based on pretrained language models (PLM).
Entity | Status | Note | |
---|---|---|---|
Gene | GNormPlus | ✅ | NER+EL |
Species | Linnaues | ✅ | NER+EL |
Species | SR4GN | ✅ | NER+EL |
Species | SPECIES | ❌ | Compilation fails |
Disease | TaggerOne | ✅ | NER+EL |
Chemical | BC7T2W | ✅ | Installation fails on Linux. NER+EL |
Variant | tmVar (v3) | ✅ | NER+EL |
Cell line | TaggerOne | ❌ | Model not available and training fails |
UMLS | MetaMap | ✅ | NER+EL |
UMLS | QuickUMLS | ❌ | Installation fails |
UMLS | SciSpacy | ✅ |
For each system there is a run_*.sh
script in the bin
folder.
The script installs the software in the user-specified directory, runs the tool and collects the output in the data/results/
directory.
(belb) user $ chmod +x ./bin/run_gnormplus.sh
(belb) user $ ./bin/run_gnormplus.sh <BELB directory> <tool directory>
E.g. to run GNormPlus:
(belb) user $ chmod +x ./bin/run_gnormplus.sh
(belb) user $ ./bin/run_gnormplus.sh <BELB directory> <tool directory>
See instructions in corresponding README.md
(belb) user $ python -m benchmark.scispacy.scispacy --run output --in_dir test --belb_dir ~/data/belb
See instructions in corresponding README.md
These type of models require training.
We only provide code to create the input in the requested format and to parse the output generated by each model.
Detailed instructions on how run these models on BELB can be found in the corresponding folders (e.g. benchmark/arboel
).