Skip to content

conversion:uses_predicate

timrdf edited this page Sep 25, 2012 · 31 revisions

What is first

  • Most of the terms in the conversion: vocabulary are conversion:Enhancements, but some terms are annotations that are created during the conversion. conversion:uses_predicate is one of those annotations.
  • conversion:uses_predicate complements the VoID Vocabulary.

What we will cover

This page will cover what the conversion:uses_predicate property describes, and a bit of background on how it is computed.

Let's get to it!

If two datasets use the same vocabulary, then there is a good chance that it will be worthwhile to combine them to get more interesting results. The conversion:uses_predicate property annotates void:Datasets with the RDF predicates that appear in the dataset's triples. For example, if the dataset http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17 contains the triples:

@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix prov:  <http://www.w3.org/ns/prov#> .

<http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17/provider/010001> 
   prov:specializationOf  <http://logd.tw.rpi.edu/id/medicare-gov/provider/010001> ;
   vcard:organization-name "Southeast Alabama Medical Center" ;
   vcard:adr <http://localhost/source/hub-healthdata-gov/provider/010001/address> ;
   prov:atLocation dbpedia:Houston_County .

then the following four triples informs which predicates the dataset uses:

<http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17>
   conversion:uses_predicate prov:specializationOf, vcard:organization-name, vard:adr, prov:atLocation .

Annotating many void:Datasets with conversion:uses_predicate allows us to quickly find datasets that share the same vocabulary. Using it lets us avoid generic queries that can take a long time for large datasets, such as:

select distinct ?p
where {
  graph <http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17> {
     [] ?p [] # This can be slow and is avoided with conversion:uses_predicate.
  }
}

Finally, conversion:uses_predicate provides an extra level of granularity of the void:vocabulary annotation, which only references the vocabulary (e.g. vcard, prov) and not the actual terms that are used within the vocabulary (e.g. vcard:adr, prov:specialiationOf). In fact, one could derive the generic void:vocabulary assertions by processing the detailed conversion:uses_predicate annotations.

def ns:
   ```Remove the local name of the term```

for dataset in datasets:
   for used in dataset.conversion_uses_predicate:
      dataset.void_vocabulary.append(ns(used))

Verifying csv2rdf4lod's implementation of conversion:uses_predicate

One advantage of using csv2rdf4lod is that it asserts conversion:uses_predicate in the metadata that it produces when converting tabular data. This section contains notes created while reviewing how csv2rdf4lod produces the conversion:uses_predicate annotations, so that we can verify that it is complete. Ensuring completeness is issue 300.

Looking at dataset hub-healthdata-gov/hospital-compare

Verifying predicates for one:

rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.ttl | awk '{print $2}' | sort -u > manual/e1-predicates.csv
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_predicate>"{print $3}' | sort -u > manual/uses-predicate.csv
diff -y -W 250 manual/uses-predicate.csv manual/e1-predicates.csv

Verifying predicates in metadata for one:

rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '{print $2}' | sort -u > manual/e1-meta-predicates.csv
rapper -g -o ntriples automatic/HQI_HOSP_AHRQ.csv.e1.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_predicate>"{print $3}' | sort -u > manual/uses-predicate.csv
diff -y -W 250 manual/uses-meta-predicate.csv manual/e1-predicates.csv

Verifying predicates for all:

rapper -g -o ntriples publish/hub-healthdata-gov-hospital-compare-2012-Jul-17.e1.ttl | awk '{print $2}' | sort -u > manual/e1-predicates.csv
rapper -g -o ntriples publish/hub-healthdata-gov-hospital-compare-2012-Jul-17.void.ttl | awk '$2 == "<http://purl.org/twc/vocab/conversion/uses_predicate>"{print $3}' | sort -u > manual/uses-predicate.csv
diff -y -W 250 manual/uses-predicate.csv manual/e1-predicates.csv

Places where predicates are asserted:

  • CSVtoRDF.visit - this is the central place that the predicate is known.
    • primary.add(subjectR, DCTerms.isReferencedBy, this.versionedDatasetR);
    • primary.add(subjectR, VoID.inDataset, this.versionedDatasetR);
    • primary.add(subjectR, RDF.TYPE, subjectRowTypeR);
  • Enhancement parameters - anything not in conversion:?

What is next

Clone this wiki locally