BELB Benchmark

Code base to reproduce benchmarking experiments on BELB (Biomedical Entity Linking Benchmark) reported in:

@article{10.1093/bioinformatics/btad698,
    author = {Garda, Samuele and Weber-Genzel, Leon and Martin, Robert and Leser, Ulf},
    title = {{BELB}: a {B}iomedical {E}ntity {L}inking {B}enchmark},
    journal = {Bioinformatics},
    pages = {btad698},
    year = {2023},
    month = {11},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btad698},
    url = {https://doi.org/10.1093/bioinformatics/btad698},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btad698/53483107/btad698.pdf},
}

Setup

We assume you have a working installation of belb in your python environment:

git clone https://github.com/sg-wbi/belb
cd belb
pip install -e .

and other requirements:

(belb-venv) user $ pip install -r requirements.txt

Models

There are two type of models: rule-based entity-specific and those based on pretrained language models (PLM).

Rule-based entity-specific

Entity		Status	Note
Gene	GNormPlus	✅	NER+EL
Species	Linnaues	✅	NER+EL
Species	SR4GN	✅	NER+EL
Species	SPECIES	❌	Compilation fails
Disease	TaggerOne	✅	NER+EL
Chemical	BC7T2W	✅	Installation fails on Linux. NER+EL
Variant	tmVar (v3)	✅	NER+EL
Cell line	TaggerOne	❌	Model not available and training fails
UMLS	MetaMap	✅	NER+EL
UMLS	QuickUMLS	❌	Installation fails
UMLS	SciSpacy	✅

For each system there is a run_*.sh script in the bin folder. The script installs the software in the user-specified directory, runs the tool and collects the output in the data/results/ directory.

(belb) user $ chmod +x ./bin/run_gnormplus.sh
(belb) user $ ./bin/run_gnormplus.sh <BELB directory> <tool directory>

E.g. to run GNormPlus:

(belb) user $ chmod +x ./bin/run_gnormplus.sh
(belb) user $ ./bin/run_gnormplus.sh <BELB directory> <tool directory>

BC7T2W

See instructions in corresponding README.md

SciSpacy

(belb) user $ python -m benchmark.scispacy.scispacy --run output --in_dir test --belb_dir ~/data/belb

MetaMap

See instructions in corresponding README.md

PLM-based

arboEL
GenBioEL
BioSyn

These type of models require training. We only provide code to create the input in the requested format and to parse the output generated by each model. Detailed instructions on how run these models on BELB can be found in the corresponding folders (e.g. benchmark/arboel).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BELB Benchmark

Setup

Models

Rule-based entity-specific

BC7T2W

SciSpacy

MetaMap

PLM-based

Files

README.md

Latest commit

History

README.md

File metadata and controls

BELB Benchmark

Setup

Models

Rule-based entity-specific

BC7T2W

SciSpacy

MetaMap

PLM-based