A "semantic bridge" between OpenStreetMap (OSM) and Wikidata (WD) by reciprocal identification.
All relevant feature at OSM can be tagged with key:wikidata
, pointing to its Wikidata semantic.
When, at Wikidata infrastructure, at the pointed semantic (a Wikidata ID) there are also a pointer to OSM, the "semantic bridge" has been built (!), so there are a complete authority control with reciprocal use. The lookup.csv
table list the OSM features that offers this reciprocity.
At July 2018 there are:
-
~1,123,500 OSM features with a
wikidata
key. -
~63,000 Wikidata entities with the OSM relation ID (
P402
) property pointing to OSM. -
5% of errors in a sample of 2000 from Wikidata, where ~1900 items passed the test (a check ensuring that each OSM feature was really tagged with a reciprocal Wikidata identification) to constitute the lookup table.
Some examples and fields description for the lookup.csv
main dataset of this project.
wdId | osm_type | osm_id | isReciprocal | check_date |
---|---|---|---|---|
Q155 | R | 59470 (js) | y | 2018-07-06 |
Q17061 | R | 23092 (js fail) | y | 2018-07-06 |
Q2880208 | W | 75488634 (js) | n | 2018-07-06 |
Q2500246 | N | 817882603 (js) | n | 2018-07-06 |
... | ... | ... | ... | ... |
wdId
: the Wikidata ID, can be resolved byhttp://wikidata.org/entity/{wdId}
osm_type
: the OSM datatype used to represent the feature.R
=Relation (polygon),W
=Way (line),N
=Node (point).osm_id
: the ID attributed to OSM feature in the check_date.
The lookup not need all these fields, but as illustration above we add:
isReciprocal
: a flag to say that the Wikidata and OSM indications are reciprocal or not (y
orn
).check_date
: an ISO 8601 date, when last checking procedure was performed.
The lookup and its CSV for error log (lookup_errors_WIKIDATA
) are generated by software, see /src
.
There are two big dump files at data/dump
folder:
- osm_relation.csv with pairs of osm_relationId-wdId fields;
- osm_way.csv with pairs of osm_relationId-wdId fields;
As commented at "Preparing OSM dumps", we can express it by Overpass and generate samples, but not do the real task, because is really big. We can be split into countryes, and it will be better to use with specialized curators... But even splitting we need OSMium tools to generate the dump files. So the v0.1 checking is using the online tools, that is a lazzy solution, so the project is producing only samples.
The service will be a hub for name resolution, in the sense of URN resolution (a standard terminology since 1997). The first step is to offer to Wikidata's P402
a persistent URL template, offering Persistent URLs, something like
http://wd.openstreetmap.org/{wikidata_id}
for redirection service,
http://wd.openstreetmap.org/{otherName}
for official reference redirection service (to main official synonyms as contry ISO codes or local ISO administrative codes).
The other resolution services (ISO to Wikidata, OSM to Wikidata, Wikidata to OSM, etc.) including canonicalization of OSM-elements (duplicates of Wikidata tag at OSM), will use something like
http://urn.openstreetmap.org/{namespace}:{name}/{method}
with a standard methods, as showed by the ISSN-L-Resolver project. The namespace
parameter is like an URN schema, the name
can be an official name or a valid ID for the namespace.
The method
defines the endpoint of a service. Typical methods implemented as JSON services are N2C
(name-to-canonic) to obtain the canonical name, isN
to check that the name exists, N2Ns
(name-to-names) to list all official synonyms of a name... and info
to return a catalog with all basic metadata that describes the refered (canonical) item.
... Organazing by country. Each "community of curators" check its data, the erros and do corrections.