End-to-end relation extraction for biomedical literature: full datasets, python API, and web GUI
The core annotators, pretrained models, and training data can be downloaded at pubmedKB core
Authors:
- Li Peng-Hsuan (李朋軒) @ ailabs.tw (jacobvsdanniel [at] gmail.com)
- Sun Yih-Yun (孫懿筠) @ ailabs.tw (jessie.yy.sun [at] gmail.com)
- Eunice You-Chi Liu (劉又綺) @ ailabs.tw (eunicecollege2019 [at] gmail.com)
This repo hosts the full datasets, python API, and web GUI for pubmedKB, a knowledge base created from annotations of PubMed. See pubmedkb_core for the core annotators behind the knowledge base. Or see our paper:
Peng-Hsuan Li, Ting-Fu Chen, Jheng-Ying Yu, Shang-Hung Shih, Chan-Hung Su, Yin-Hung Lin, Huai-Kuang Tsai, Hsueh-Fen Juan, Chien-Yu Chen and Jia-Hsin Huang, pubmedKB: an interactive web server to explore biomedical entity relations from biomedical literature, Nucleic Acids Research, 2022, https://doi.org/10.1093/nar/gkac310
Functions
- NEN
- Look up similar names to query input
- Also return IDs and aliases for each name
- REL
- Look up relations and evidence sentences for entity or entity pair
- Specify an entity by name and/or ID
Dependencies
- OS-independent
- python3
- Flask (a python package)
Support data | content | Disk size (zip size) |
---|---|---|
gene | id, name correspondence | 226 MB (52 MB) |
variant | id, name, gene correspondence | 133 MB (28 MB) |
meta | title, author, year, journal, citation, IF | 8.6 GB (1.6 GB) |
paper | title, abstract, entity | 99 GB (13GB) |
Relation data | Relations | # papers with relations* | Section | Disk size (zip size) | Memory usage |
---|---|---|---|---|---|
Full KB | odds ratio, causal, open relations, etc. | 8.5 M | abstract | 12 GB (2.2 GB) | 15 GB |
Partial KB | odds ratio, causal, open relations, etc. | 0.3 M | abstract | 487 MB (88 MB) | 10 GB |
*We processed all 35M PubMed citations dumped on 2023/02/17.
Checkout the old version of pubmedkb_web to use these datasets.
git checkout 2e79a4bbf4258c88dda1ddc7f4e4f3ee37443896
Full dataset | zip size | #papers | section | memory-efficient | open access |
---|---|---|---|---|---|
pubmedKB-BERN-disk | 1.6 GB | 4.3 M | abstract | O | O |
pubmedKB-PTC-memory | 3.1 GB | 10.8 M | abstract | X | X |
pubmedKB-PTC-disk | 3.4 GB | 10.8 M | abstract | O | X |
pubmedKB-PTC-FT-disk | 3.7 GB | 1.7 M | full text | O | X |
Partial dataset | zip size | #papers | section | memory-efficient | open access |
---|---|---|---|---|---|
pubmedKB-BERN-disk-small | 336 MB | 884 K | abstract | O | O |
pubmedKB-PTC-disk-small | 605 MB | 2.0 M | abstract | O | O |
pubmedKB-PTC-FT-disk-small | 781 MB | 336 K | full text | O | O |
python server.py \
--gene_dir [gene_directory] \
--variant_dir [variant_directory] \
--meta_dir [meta_directory] \
--paper_dir [paper_directory] \
--kb_dir [KB_directory] \
--kb_type relation \
--port 8000
- Supports both HTTP GET and POST
- Displays results on an HTML webpage or return a JSON file
- Open browser and connect to [server_ip]:[server_port]
- Also check out client.py