Usage

Basic usage

This document assumes that you have prepared the AIDA-YAGO2-dataset.tsv file as per Data set.

Evaluating NEL data is complicated and relies on making a number of different choices. Thus, we split it into two subtasks: prepare and evaluate.

For a more in-depth example of usage, please see the run_core_evaluation script.

Preparing your datasets

The aim is to adapt gold and system output so that they can be fairly compared with respect to:

document selection: the original dataset includes train, testa and testb. We supply reference output on testb.
mapping: Wikipedia is a "moving target" and evaluating on a different version can lead to different results. We supply mapping files from Wikipedia snapshots and fetched from their API.
entity link normalisation: we consider the links Tom Cruise, Tom_Cruise and http://en.wikipedia.org/wiki/Tom_Cruise to be equivalent.

For example, to prepare gold-standard data for evaluation:

./cne prepare -k ".*testb.*" -m mappings/map-testb-fromapi-20140227.tsv /path/to/AIDA-YAGO2-dataset.tsv > gold-api20140227.testb.txt

And a system output in the same way:

./cne prepare -k ".*testb.*" -m mappings/map-testb-fromapi-20140227.tsv system.txt` > system-api20140227.testb.txt

Evaluating performance

The main script produces various [evaluation measures](Evaluation measures). It takes the gold-standard annotation and system output in [AIDA/CoNLL format](Data format):

cne evaluate -g gold-api20140227.testb.txt system-api20140227.testb.txt

Variations you may want to consider:

some systems use gold-standard mentions, to compare against them, you would need to adapt your system to do the same
some systems use score thresholds to select more or less confident links, you would need to adapt your system to output more/fewer links

Advanced

Fetch a map from Wikipedia redirects

The mapping file should contain lines corresponding to Wikipedia titles. The first column contains the newer title and any following tab-separated columns contain names that should map to the newer title (e.g., titles of redirect pages that point to the newer title).

We supply some mapping files, but the fetch-mapping script can be used to generate a current redirect mapping using the Wikipedia API:

./cne fetch-mapping GOLD.testb > MAP.testb

If you have an older Wikipedia snapshot, you can generate a mapping file in the same format to perform longitudinal analysis.

Knit BAT TagMe output to gold standard CoNLL/AIDA tokenisation

TagMe output from the BAT framework uses different tokenisation. To map this output into AIDA format over CoNLL tokens:

./cne knit -a TAGME.xml -t 0.289 GOLD.testb > TAGME.aida

Only annotations with scores greater than the threshold specified by -t are kept.

Error analysis

We can describe different types of errors:

wrong-link - where we link a mention to the wrong KB node
link-as-nil - where we fail to link a mention to the KB
nil-as-link - where we link a mention that should not be
missing - where we do not exactly detect a mention
extra - where we detect a mention that is not exactly in the gold-standard

Running

./cne analyze -g GOLD FILE

Gives us output like:

link-as-nil	<doc_id>	m"<mention>"	g"<entity>"	s"None"
wrong-link	<doc_id>	m"<mention>"	g"<entity_a>"   s"<entity_b>"
nil-as-link	<doc_id>	m"<mention>"	g"None"	        s"<entity>"
extra	        <doc_id>	m"<mention>"	s"None"
missing		<doc_id>        m"<mention>"	s"<entity>"

We can produce a summary by supplying the -s option:

./cne analyze -s -g GOLD SYSTEM

Giving us output like:

 652 extra
 114 link-as-nil
1306 missing
 333 nil-as-link
 606 wrong-link

Sharing system output for comparison

We provide tools for removing original data from your system outputs: unstitch and stitch.

./cne unstitch SYSTEM > references/SYSTEM

./cne stitch -g GOLD references/SYSTEM > SYSTEM

Installing for programmatic use

Pip should be able to install directly from this repository:

mkdir some_project
cd some_project
virtualenv ve
source ve/bin/activate
pip install git+git://github.com/benhachey/conll03_nel_eval.git#egg=CNE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly