-
Notifications
You must be signed in to change notification settings - Fork 36
conversion:uses_predicate
- Most of the terms in the conversion: vocabulary are conversion:Enhancements, but some terms are annotations that are created during the conversion.
conversion:uses_predicate
is one of those annotations. -
conversion:uses_predicate
complements the VoID Vocabulary.
This page will cover what the conversion:uses_predicate
property describes, and a bit of background on how it is computed.
If two datasets use the same vocabulary, then there is a good chance that it will be worthwhile to combine them to get more interesting results. The conversion:uses_predicate
property annotates void:Datasets with the RDF predicates that appear in the dataset's triples. For example, if the dataset http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17 contains the triples:
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
<http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17/provider/010001>
prov:specializationOf <http://logd.tw.rpi.edu/id/medicare-gov/provider/010001> ;
vcard:organization-name "Southeast Alabama Medical Center" ;
vcard:adr <http://localhost/source/hub-healthdata-gov/provider/010001/address> ;
prov:atLocation dbpedia:Houston_County .
then the following four triples informs which predicates the dataset uses:
<http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17>
conversion:uses_predicate prov:specializationOf, vcard:organization-name, vard:adr, prov:atLocation .
Annotating many void:Datasets with conversion:uses_predicate
allows us to quickly find datasets that share the same vocabulary. Using it lets us avoid generic queries that can take a long time for large datasets, such as:
select distinct ?p
where {
graph <http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17> {
[] ?p [] # This can be slow and is avoided with conversion:uses_predicate.
}
}
Finally, conversion:uses_predicate
provides an extra level of granularity of the void:vocabulary annotation, which only references the vocabulary (e.g. vcard, prov) and not the actual terms that are used within the vocabulary (e.g. vcard:adr, prov:specialiationOf). In fact, one could derive the generic void:vocabulary
assertions by processing the detailed conversion:uses_predicate
annotations.
def ns:
```Remove the local name of the term```
for dataset in datasets:
for used in dataset.conversion_uses_predicate:
dataset.void_vocabulary.append(ns(used))
One advantage of using csv2rdf4lod is that it asserts conversion:uses_predicate
in the metadata that it produces when converting tabular data. This section contains notes created while reviewing how csv2rdf4lod produces the conversion:uses_predicate
annotations, so that we can verify that it is complete. Ensuring completeness is issue 300.
Looking at dataset hub-healthdata-gov/hospital-compare
Verifying predicates for one:
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.ttl | awk '{print $2}' | sort -u > manual/e1-predicates.csv
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_predicate>"{print $3}' | sort -u > manual/uses-predicate.csv
diff -y -W 250 manual/uses-predicate.csv manual/e1-predicates.csv
Verifying predicates in metadata for one:
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '{print $2}' | sort -u > manual/e1-meta-predicates.csv
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_predicate>"{print $3}' | sort -u > manual/uses-predicate.csv
diff -y -W 250 manual/uses-meta-predicate.csv manual/e1-predicates.csv
Verifying predicates for all:
rapper -g -o ntriples publish/hub-healthdata-gov-hospital-compare-2012-Jul-17.e1.ttl | awk '{print $2}' | sort -u > manual/e1-predicates.csv
rapper -g -o ntriples publish/hub-healthdata-gov-hospital-compare-2012-Jul-17.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_predicate>"{print $3}' | sort -u > manual/uses-predicate.csv
diff -y -W 250 manual/uses-predicate.csv manual/e1-predicates.csv
Verifying classes for one:
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.ttl | awk '$2 == "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>"{print $3}' | sort -u > manual/e1-classes.csv
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_class>"{print $3}' | sort -u > manual/uses-class.csv
diff -y -W 250 manual/uses-class.csv manual/e1-classes.csv
Verifying classes for all:
rapper -g -o ntriples publish/hub-healthdata-gov-hospital-compare-2012-Jul-17.e1.ttl | awk '$2 == "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>"{print $3}' | sort -u > manual/e1-classes.csv
rapper -g -o ntriples publish/hub-healthdata-gov-hospital-compare-2012-Jul-17.void.ttl| awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_class>"{print $3}' | sort -u > manual/uses-class.csv
diff -y -W 250 manual/uses-class.csv manual/e1-classes.csv