Skip to content

Latest commit

 

History

History
119 lines (90 loc) · 3.86 KB

README.md

File metadata and controls

119 lines (90 loc) · 3.86 KB

Joint Extraction of Fact and Condition Tuples from Sceintific Text

Introduction

This repository contains source code for the EMNLP 2019 paper " "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (Paper).

Usage

1.Clone the Repository

git clone https://github.com/twjiang/MIMO_CFE.git

2.Download External Resources

  • The dumped MIMO can be found here.

  • The word embedding we use can be found here.

  • The pre-trained language model we use can be found here.

put these files into ./resources folder

3.Install Requirements

This repo is tested on Python 3.6, PyTorch 1.2.0

Create Environment (Optional): Ideally, you should create an environment for the project.

conda create -n mimo python=3.6

conda activate mimo

pip install -r requirments.txt

4.Start a demo application

cd MIMO_service

python mimo_server.py #Start a MIMO service

python client.py 

The output of the demo is shown below.

{
	'statements': {
		'stmt 1': {
			'text': 'Histone deacetylase inhibitor valproic acid ( VPA ) has been used to increase the reprogramming efficiency of induced pluripotent stem cell ( iPSC ) from somatic cells , yet the specific molecular mechanisms underlying this effect is unknown .',
			'fact tuples': [
				['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
				['VPA', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
				['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming'],
				['specific molecular mechanisms', 'NIL', 'is unknown', 'NIL', 'NIL']
			],
			'condition tuples': [
				['iPSC', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
				['induced pluripotent stem cell', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
				['specific molecular mechanisms', 'NIL', 'underlying', 'NIL', 'effect']
			],
			'concept_indx': [0, 1, 2, 3, 4, 6, 17, 18, 19, 20, 22, 25, 26, 30, 31, 32],
			'attr_indx': [14, 15, 35],
			'predicate_indx': [8, 9, 10, 11, 12, 24, 33, 36, 37]
		}
	}
}

5. Train Your Own MIMO

example commands for pretrain:

(all gates for LM, pretrain)

python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM --pretrain

(all gates for POS, pretrain)

python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM --pretrain

(all gates for LM and POS, pretrain)

python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM --pretrain

example commands with multi-output:

(all gates for LM with multi-output)

python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM

(all gates for POS with multi-output)

python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM

(all gates for LM and POS, with multi-output)

python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM

Reference

@inproceedings{jiang-mimo,
    title = "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text",
    author = "Jiang, Tianwen and Zhao, Tong and Qin, Bing and Liu, Ting and Chawla, Nitesh V and Jiang, Meng",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
}