Skip to content

Latest commit

 

History

History
94 lines (69 loc) · 4.29 KB

README.md

File metadata and controls

94 lines (69 loc) · 4.29 KB

BELB Benchmark

Code base to reproduce benchmarking experiments on BELB (Biomedical Entity Linking Benchmark) reported in:

@article{10.1093/bioinformatics/btad698,
    author = {Garda, Samuele and Weber-Genzel, Leon and Martin, Robert and Leser, Ulf},
    title = {{BELB}: a {B}iomedical {E}ntity {L}inking {B}enchmark},
    journal = {Bioinformatics},
    pages = {btad698},
    year = {2023},
    month = {11},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btad698},
    url = {https://doi.org/10.1093/bioinformatics/btad698},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btad698/53483107/btad698.pdf},
}

Setup

We assume you have a working installation of belb in your python environment:

git clone https://github.com/sg-wbi/belb
cd belb
pip install -e .

and other requirements:

(belb-venv) user $ pip install -r requirements.txt

Models

There are two type of models: rule-based entity-specific and those based on pretrained language models (PLM).

Rule-based entity-specific

Entity Status Note
Gene GNormPlus NER+EL
Species Linnaues NER+EL
Species SR4GN NER+EL
Species SPECIES Compilation fails
Disease TaggerOne NER+EL
Chemical BC7T2W Installation fails on Linux. NER+EL
Variant tmVar (v3) NER+EL
Cell line TaggerOne Model not available and training fails
UMLS MetaMap NER+EL
UMLS QuickUMLS Installation fails
UMLS SciSpacy

For each system there is a run_*.sh script in the bin folder. The script installs the software in the user-specified directory, runs the tool and collects the output in the data/results/ directory.

(belb) user $ chmod +x ./bin/run_gnormplus.sh
(belb) user $ ./bin/run_gnormplus.sh <BELB directory> <tool directory>

E.g. to run GNormPlus:

(belb) user $ chmod +x ./bin/run_gnormplus.sh
(belb) user $ ./bin/run_gnormplus.sh <BELB directory> <tool directory>

BC7T2W

See instructions in corresponding README.md

SciSpacy

(belb) user $ python -m benchmark.scispacy.scispacy --run output --in_dir test --belb_dir ~/data/belb

MetaMap

See instructions in corresponding README.md

PLM-based

These type of models require training. We only provide code to create the input in the requested format and to parse the output generated by each model. Detailed instructions on how run these models on BELB can be found in the corresponding folders (e.g. benchmark/arboel).