Skip to content
Ben Hachey edited this page May 19, 2014 · 6 revisions

Basic usage

This document assumes that you have prepared the AIDA-YAGO2-dataset.tsv file as per Data set.

Evaluating NEL data is complicated and relies on making a number of different choices. Thus, we split it into two subtasks: prepare and evaluate.

For a more in-depth example of usage, please see the run_core_evaluation script.

Preparing your datasets

The aim is to adapt gold and system output so that they can be fairly compared with respect to:

  • document selection: the original dataset includes train, testa and testb. We supply reference output on testb.
  • mapping: Wikipedia is a "moving target" and evaluating on a different version can lead to different results. We supply mapping files from Wikipedia snapshots and fetched from their API.
  • entity link normalisation: we consider the links Tom Cruise, Tom_Cruise and http://en.wikipedia.org/wiki/Tom_Cruise to be equivalent.

For example, to prepare gold-standard data for evaluation:

./cne prepare -k ".*testb.*" -m mappings/map-testb-fromapi-20140227.tsv /path/to/AIDA-YAGO2-dataset.tsv > gold-api20140227.testb.txt

And a system output in the same way:

./cne prepare -k ".*testb.*" -m mappings/map-testb-fromapi-20140227.tsv system.txt` > system-api20140227.testb.txt

Evaluating performance

The main script produces various [evaluation measures](Evaluation measures). It takes the gold-standard annotation and system output in [AIDA/CoNLL format](Data format):

cne evaluate -g gold-api20140227.testb.txt system-api20140227.testb.txt

Variations you may want to consider:

  • some systems use gold-standard mentions, to compare against them, you would need to adapt your system to do the same
  • some systems use score thresholds to select more or less confident links, you would need to adapt your system to output more/fewer links

Advanced

Fetch a map from Wikipedia redirects

The mapping file should contain lines corresponding to Wikipedia titles. The first column contains the newer title and any following tab-separated columns contain names that should map to the newer title (e.g., titles of redirect pages that point to the newer title).

We supply some mapping files, but the fetch-mapping script can be used to generate a current redirect mapping using the Wikipedia API:

./cne fetch-mapping GOLD.testb > MAP.testb

If you have an older Wikipedia snapshot, you can generate a mapping file in the same format to perform longitudinal analysis.

Knit BAT TagMe output to gold standard CoNLL/AIDA tokenisation

TagMe output from the BAT framework uses different tokenisation. To map this output into AIDA format over CoNLL tokens:

./cne tagme -a TAGME.xml -t 0.289 GOLD.testb > TAGME.aida

Only annotations with scores greater than the threshold specified by -t are kept.

Error analysis

We can describe different types of errors:

  • wrong-link - where we link a mention to the wrong KB node
  • link-as-nil - where we fail to link a mention to the KB
  • nil-as-link - where we link a mention that should not be
  • missing - where we do not exactly detect a mention
  • extra - where we detect a mention that is not exactly in the gold-standard

Running

./cne analyze -g GOLD FILE

Gives us output like:

link-as-nil	<doc_id>	m"<mention>"	g"<entity>"	s"None"
wrong-link	<doc_id>	m"<mention>"	g"<entity_a>"   s"<entity_b>"
nil-as-link	<doc_id>	m"<mention>"	g"None"	        s"<entity>"
extra	        <doc_id>	m"<mention>"	s"None"
missing		<doc_id>        m"<mention>"	s"<entity>"

We can produce a summary by supplying the -s option:

./cne analyze -s -g GOLD SYSTEM

Giving us output like:

 652 extra
 114 link-as-nil
1306 missing
 333 nil-as-link
 606 wrong-link

Sharing system output for comparison

We provide tools for removing original data from your system outputs: unstitch and stitch.

./cne unstitch SYSTEM > references/SYSTEM
./cne stitch -g GOLD references/SYSTEM > SYSTEM

Installing for programmatic use

Pip should be able to install directly from this repository:

mkdir some_project
cd some_project
virtualenv ve
source ve/bin/activate
pip install git+git://github.com/benhachey/conll03_nel_eval.git#egg=CNE