IPRE: a Dataset for Inter-Personal Relationship Extraction
Files | Contents |
---|---|
sent_train/dev/test | sentID eh et sentence |
sent_relation_train/dev/test | sentID relationIDs |
bag_relation_train/dev/test | bagID eh et sentIDs relationIDs |
The above table shows the format of our data set. Some keywords are explained as follows:
- eh The head entity in the sentence.
- et The tail entity in the sentence.
- sentence A segmented sentence contains the eh and et.
- sentID A unique ID given to an instance, which consists of a ordered entity pair and a sentence.
- bagID A unique ID given to a bag, and all setences in a bag contains the same entity pair.
- relationID A unique ID given to a relation type.
Note that the part represented by the keywords mentioned above is separated by a tab character. Moreover, relationIDs is composed of several relationID that separated by a space character, and sentIDs is also so.
Each dataset contains three files, in which one file stores the mapping information between sentID and instance, and the remaining two files store the data in dataset at sentence-level and bag-level respectively.
For example, test set contains sent_test.txt
, sent_relation_test.txt
and bag_relation_test.txt
.
The mapping information between sentID and instance in test set is stored in sent_test.txt
, and the data in test set at sentence-level and bag-level are stored in sent_relation_test.txt
and bag_relation_test.txt
respectively.
If you use the IPRE data, please cite the following paper:
- Haitao Wang, Zhengqiu He, Jin Ma, Wenliang Chen, Min Zhang. 2019. IPRE: a Dataset for Inter-Personal Relationship Extraction. https://arxiv.org/abs/1907.12801