Skip to content

Wikidata Pipeline

Jan Ehmueller edited this page Jul 27, 2017 · 8 revisions

The Wikdata pipeline describes the import of structured data from Wikidata. The import follows the guidelines of the Structured Data Import. All relevant Jobs are provided below and sorted by their execution order. Notice it is assumed that Implisense is already imported.

Normalization

  1. WikidataImport
  2. TagEntities
  3. ResolveEntities
  4. WikidataDataLakeImport
  5. Find Relations

Duplicate Detection

  1. Deduplication with config file deduplication_wikidata.xml

Data Merge

  1. Merging
  2. MasterConnecting

Next step DBpedia

Clone this wiki locally