-
Notifications
You must be signed in to change notification settings - Fork 4
Benchmark results
Hoffart et al. (2011) use non-NIL, gold-standard mentions as system input.
They report 81.8 strong_link_match
precision, which they refer to as p@1
. An updated score of 82.5 is reported on their project web site.
Hoffart et al. (2011) also report 89.1 mean average precision (MAP
) for a slightly different system configuration. This is calculated over mention-entity pairs ordered by decreasing system confidence.
TODO: Verify that there is only one entity per mention in MAP
calculation and that mentions are ordered by disambiguation confidence.
TODO: Ask whether mentions are ordered by document or globally across all documents.
Hoffart et al. (2012) use non-NIL, gold-standard mentions as system input. They report 82.3 strong_link_match
precision, which they refer to as accuracy
.
Pilz & Paass (2012) use non-NIL, gold-standard mentions as system input. They report 82.2 entity_link_match
F-score, which they refer to as F_BOT
.
Pilz & Paass also report 89.3 MAP
, where mention-entity pairs are ordered by decreasing system confidence. This is compared to 89.1 MAP
reported by Hoffart et al. (2011).
TODO: Clarify how F_BOT accounts for sequential input order.
TODO: Verify that there is only one entity per mention in MAP
calculation.
Cornolti et al. (2013) compare a number of systems:
System | Reference | weak_link_match |
strong_link_match |
weak_mention_match |
entity_link_match |
---|---|---|---|---|---|
TagMe2 | Ferragina & Scaiella (2010) | 58.3 | 56.7 | 74.6 | 65.6 |
Illinois Wikifier | Ratinov et al. (2011) | 54.0 | 50.7 | 68.5 | 56.6 |
Wikipedia Miner | Milne & Witten (2008) | 49.7 | 46.0 | 70.0 | 50.9 |
AIDA | Hoffart et al. (2011) | 46.7 | 47.4 | 58.7 | 55.7 |
DBpedia Spotlight | Mendes et al. (2011) | 35.2 | 33.5 | 50.4 | 35.9 |