-
Notifications
You must be signed in to change notification settings - Fork 78
Wikidata old
This is an older version of the Wikidata page, before we automatically got ALL Wikidata.
We get Wikidata from the Wikidata API via wbgetentities. Our data['properties']
are all the Wikidata properties that we listen for (via wptools.wikidata.LABELS
) when we call get_wikidata()
. Some of these properties will have a value that we can use immediately. Some of them have claims ("Q values" or "items") which must be parsed with another API call (get_claims()
).
We call them "claims" because that is how properties are presented by the Wikidata API (entities[<item>][claims][<property>]
). We define a selection of properties to capture because the list of all properties is enormous, and growing!
Let's look at an example:
>>> art = wptools.page('Art Blakey').get_wikidata()
www.wikidata.org (wikidata) Art Blakey
www.wikidata.org (claims) Q6581097|Q8341|Q30|Q5|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art Blakey (en) data
{
aliases: <list(2)> Abdullah Ibn Buhaina, Arthur Blakey
claims: <dict(5)> Q6581097, Q8341, Q30, Q5, Q9048913
description: American jazz drummer and bandleader
image: <list(1)> {'kind': 'wikidata-image', u'descriptionshortur...
label: Art Blakey
modified: <dict(1)> wikidata
pageid: 299895
properties: <dict(10)> P136, P345, P910, P27, P21, P856, P569, P...
title: Art_Blakey
what: human
wikibase: Q311715
wikidata: <dict(10)> website, category, death, citizenship, gend...
wikidata_url: https://www.wikidata.org/wiki/Q311715
}
Here are the properties and values we found for Art Blakey with get_wikidata()
:
>>> art.data['properties']
{u'P136': [u'Q8341'],
u'P18': [u'Art Blakey08.JPG'],
u'P27': [u'Q30'],
u'P31': [u'Q5'],
u'P345': [u'nm0086845'],
u'P569': [u'+1919-10-11T00:00:00Z'],
u'P570': [u'+1990-10-16T00:00:00Z'],
u'P856': [u'http://www.artblakey.com'],
u'P910': [u'Q9048913']}
And here are the properties and labels we listened for:
>>> sorted([{x:art.LABELS[x]} for x in art.LABELS if x in art.data['properties']])
[{'P136': 'genre'},
{'P18': 'image'},
{'P27': 'citizenship'},
{'P31': 'instance'},
{'P345': 'IMDB'},
{'P569': 'birth'},
{'P570': 'death'},
{'P856': 'website'},
{'P910': 'category'}]
Some property values are useful right away:
P856: website = http://www.artblakey.com
So that gets put in wikidata
with the meaningful label we defined in wptools.wikidata.LABELS
:
>>> art.data['wikidata']['website']
u'http://www.artblakey.com'
But other property values are "claims" that need to be resolved:
P136: genre = Q8341
Property values that start with "Q" get put into claims
:
>>> art.claims
{u'Q30': 'citizenship',
u'Q5': 'instance',
u'Q8341': 'genre',
u'Q9048913': 'category'}
That says that art
has the Wikidata item Q8341 (jazz) for his genre.
When we get unresolved claims, the tool will call get_claims()
(from above):
www.wikidata.org (claims) Q8341|Q30|Q5|Q9048913
You can find the claims query in the cache
>>> art.query('claims')
u'https://www.wikidata.org/w/api.php?action=wbgetentities&formatversion=2&ids=Q8341|Q30|Q5|Q9048913&languages=en&props=labels&redirects=yes&sites=&titles='
We reuse the action=wbgetentities
query with no title, and "Q values" or items for the ids
parameter.
We then update the wikidata
attribute with the fully determined value for each claim:
>>> art.data['wikidata']
{'IMDB': u'nm0086845',
'birth': u'+1919-10-11T00:00:00Z',
'category': None,
'citizenship': u'United States of America',
'death': u'+1990-10-16T00:00:00Z',
'genre': u'jazz',
'image': u'Art Blakey08.JPG',
'instance': u'human',
'website': u'http://www.artblakey.com'}
Our get_wikidata()
query returned many properties which we did not resolve:
>>> len(art.data['wikidata'])
9
>>> j = json.loads(art.cache['wikidata']['response'])
>>> len((j['entities']['Q311715']['claims']))
46
That is expected because we do not listen for all possible properties, as mentioned above.
You can listen for additional Wikidata properties by extending wptools.wikidata.LABELS
:
>>> art = wptools.page('Art Blakey') # flush cache
>>> art.update_labels({'P19': 'birthplace'})
>>> art.get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (claims) Q8341|Q30|Q5|Q1342|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
claims: <dict(5)> {Q1342, Q30, Q5, Q8341, Q9048913}
properties: <dict(10)> {P136, P18, P19, P27, P31, P345, P569, P570, P85...
wikidata: <dict(10)> {IMDB, birth, birthplace, category, citizensh...
wikidata_url: https://www.wikidata.org/wiki/Q311715
...
}
>>> art.wikidata['birthplace']
u'Pittsburgh'
Now we know that Art Blakey's birthplace ("Pittsburgh") is Wikidata item Q1342, but we only needed to ask for property P19 ("place of birth") and we assigned
that property a convenient label, birthplace
.