-
Notifications
You must be signed in to change notification settings - Fork 4
Usage
This document assumes that you have prepared the AIDA-YAGO2-dataset.tsv
file as per Data set.
Evaluating NEL data is complicated and relies on making a number of different choices. Thus, we split it into two subtasks: prepare
and evaluate
.
For a more in-depth example of usage, please see the run_core_evaluation script.
The aim is to adapt gold and system output so that they can be fairly compared with respect to:
- document selection: the original dataset includes
train
,testa
andtestb
. We supply reference output ontestb
. - mapping: Wikipedia is a "moving target" and evaluating on a different version can lead to different results. We supply mapping files from Wikipedia snapshots and fetched from their API.
- entity link normalisation: we consider the links
Tom Cruise
,Tom_Cruise
andhttp://en.wikipedia.org/wiki/Tom_Cruise
to be equivalent.
For example, to prepare gold-standard data for evaluation:
./cne prepare -k ".*testb.*" -m mappings/map-testb-fromapi-20140227.tsv /path/to/AIDA-YAGO2-dataset.tsv > gold-api20140227.testb.txt
And a system output in the same way:
./cne prepare -k ".*testb.*" -m mappings/map-testb-fromapi-20140227.tsv system.txt` > system-api20140227.testb.txt
The main script produces various [evaluation measures](Evaluation measures). It takes the gold-standard annotation and system output in [AIDA/CoNLL format](Data format):
cne evaluate -g gold-api20140227.testb.txt system-api20140227.testb.txt
Variations you may want to consider:
- some systems use gold-standard mentions, to compare against them, you would need to adapt your system to do the same
- some systems use score thresholds to select more or less confident links, you would need to adapt your system to output more/fewer links
The mapping file should contain lines corresponding to Wikipedia titles. The first column contains the newer title and any following tab-separated columns contain names that should map to the newer title (e.g., titles of redirect pages that point to the newer title).
We supply some mapping files, but the fetch-mapping
script can be used to generate a current redirect mapping using the Wikipedia API:
./cne fetch-mapping GOLD.testb > MAP.testb
If you have an older Wikipedia snapshot, you can generate a mapping file in the same format to perform longitudinal analysis.
TagMe output from the BAT framework uses different tokenisation. To map this output into AIDA format over CoNLL tokens:
./cne knit -a TAGME.xml -t 0.289 GOLD.testb > TAGME.aida
Only annotations with scores greater than the threshold specified by -t
are kept.
We can describe different types of errors:
-
wrong-link
- where we link a mention to the wrong KB node -
link-as-nil
- where we fail to link a mention to the KB -
nil-as-link
- where we link a mention that should not be -
missing
- where we do not exactly detect a mention -
extra
- where we detect a mention that is not exactly in the gold-standard
Running
./cne analyze -g GOLD FILE
Gives us output like:
link-as-nil <doc_id> m"<mention>" g"<entity>" s"None"
wrong-link <doc_id> m"<mention>" g"<entity_a>" s"<entity_b>"
nil-as-link <doc_id> m"<mention>" g"None" s"<entity>"
extra <doc_id> m"<mention>" s"None"
missing <doc_id> m"<mention>" s"<entity>"
We can produce a summary by supplying the -s
option:
./cne analyze -s -g GOLD SYSTEM
Giving us output like:
652 extra
114 link-as-nil
1306 missing
333 nil-as-link
606 wrong-link
We provide tools for removing original data from your system outputs: unstitch
and stitch
.
./cne unstitch SYSTEM > references/SYSTEM
./cne stitch -g GOLD references/SYSTEM > SYSTEM
Pip should be able to install directly from this repository:
mkdir some_project
cd some_project
virtualenv ve
source ve/bin/activate
pip install git+git://github.com/benhachey/conll03_nel_eval.git#egg=CNE