Skip to content

A "semantic bridge" between OSM and Wikidata by reciprocal identification

License

Notifications You must be signed in to change notification settings

OSMBrasil/semantic-bridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

semantic-bridge

A "semantic bridge" between OpenStreetMap (OSM) and Wikidata (WD) by reciprocal identification.

Basics

All relevant feature at OSM can be tagged with key:wikidata, pointing to its Wikidata semantic.

When, at Wikidata infrastructure, at the pointed semantic (a Wikidata ID) there are also a pointer to OSM, the "semantic bridge" has been built (!), so there are a complete authority control with reciprocal use. The lookup.csv table list the OSM features that offers this reciprocity.

At July 2018 there are:

The lookup as certification

Some examples and fields description for the lookup.csv main dataset of this project.

wdId osm_type osm_id isReciprocal check_date
Q155 R 59470 (js) y 2018-07-06
Q17061 R 23092 (js fail) y 2018-07-06
Q2880208 W 75488634 (js) n 2018-07-06
Q2500246 N 817882603 (js) n 2018-07-06
... ... ... ... ...
  • wdId: the Wikidata ID, can be resolved by http://wikidata.org/entity/{wdId}
  • osm_type: the OSM datatype used to represent the feature. R=Relation (polygon), W=Way (line), N=Node (point).
  • osm_id: the ID attributed to OSM feature in the check_date.

The lookup not need all these fields, but as illustration above we add:

  • isReciprocal: a flag to say that the Wikidata and OSM indications are reciprocal or not (y or n).
  • check_date: an ISO 8601 date, when last checking procedure was performed.

The lookup and its CSV for error log (lookup_errors_WIKIDATA) are generated by software, see /src.

Dump as source for comparisions

There are two big dump files at data/dump folder:

As commented at "Preparing OSM dumps", we can express it by Overpass and generate samples, but not do the real task, because is really big. We can be split into countryes, and it will be better to use with specialized curators... But even splitting we need OSMium tools to generate the dump files. So the v0.1 checking is using the online tools, that is a lazzy solution, so the project is producing only samples.

Towards a microservice to offer the lookup-table

The service will be a hub for name resolution, in the sense of URN resolution (a standard terminology since 1997). The first step is to offer to Wikidata's P402 a persistent URL template, offering Persistent URLs, something like
  http://wd.openstreetmap.org/{wikidata_id}
for redirection service,
  http://wd.openstreetmap.org/{otherName}
for official reference redirection service (to main official synonyms as contry ISO codes or local ISO administrative codes).

The other resolution services (ISO to Wikidata, OSM to Wikidata, Wikidata to OSM, etc.) including canonicalization of OSM-elements (duplicates of Wikidata tag at OSM), will use something like
  http://urn.openstreetmap.org/{namespace}:{name}/{method}
with a standard methods, as showed by the ISSN-L-Resolver project. The namespace parameter is like an URN schema, the name can be an official name or a valid ID for the namespace.

The method defines the endpoint of a service. Typical methods implemented as JSON services are N2C (name-to-canonic) to obtain the canonical name, isN to check that the name exists, N2Ns (name-to-names) to list all official synonyms of a name... and info to return a catalog with all basic metadata that describes the refered (canonical) item.

The curators

... Organazing by country. Each "community of curators" check its data, the erros and do corrections.


  Contents and data of this project are dedicated to