This repository contains codes and models clinical concept extraction described in our paper https://arxiv.org/abs/1810.10566. It is designed for a clinical concept extraction task such as the 2010 i2b2/VA shared task.
The package is tested in Python 3.6. To begin with, install tensorflow
according to this instruction and
pip install git+https://github.com/noc-lab/clinical_concept_extraction.git
Next, create a folder, say cce_assets
, and set an environment variable CCE_ASSETS
to the path of the folder. Download the pretrained ELMo model here and unzip files to the folder. Currently, we don't provide the pretrained LSTM model using I2B2 data due to i2b2 license. But we provide a silver model here. We use the gold model trained using all training and test data in 2010 i2b2/VA shared task to generate silver annotations for 2000 discharge summaries in MIMIC-III. Then we fit the these data and get the silver model. Finally, the files should be structured as follows:
cce_assets
├── blstm
│ ├── checkpoint
│ ├── model.data-00000-of-00001
│ └── model.index
└── elmo
├── mimic_wiki.hdf5
├── options.json
└── vocab.txt
An example of how to use the package is shown here
If you use the code, please cite this paper:
@article{zhu2018clinical,
title={Clinical Concept Extraction with Contextual Word Embedding},
author={Zhu, Henghui and Paschalidis, Ioannis Ch and Tahmasebi, Amir},
journal={arXiv preprint arXiv:1810.10566},
year={2018}
}