Skip to content

Querying HTTP redirect provenance to find new dataset versions

Timothy Lebo edited this page Feb 14, 2012 · 19 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

(Earlier work after initial provenance crawl is in this demo.)

Find the latest two versions of the HTTP redirect crawls (results):

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
SELECT distinct(?versioned)
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/twc-rpi-edu/dataset/data-pointer-http-headers> void:subset ?versioned .
    optional { ?versioned dcterms:modified ?modified }
  }
} order by desc(?modified) limit 2

Redirect chain for retrieving data-gov's 1554 (results):

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
PREFIX irw:        <http://www.ontologydesignpatterns.org/ont/web/irw.owl#>

SELECT *
WHERE {
  GRAPH <http://logd.tw.rpi.edu/source/twc-rpi-edu/dataset/data-pointer-http-headers/version/2011-Mar-05> {
    { ?a irw:refersTo ?refersTo }
    union
    { ?b irw:redirectsTo ?redirectsTo }
  }
  filter(regex(str(?a),'1554') || regex(str(?b),'1554') || 
         regex(str(?refersTo),'1554') || regex(str(?redirectsTo),'1554'))
}

Where we started the chain (results):

PREFIX ov:         <http://open.vocab.org/terms/>
PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
PREFIX irw:        <http://www.ontologydesignpatterns.org/ont/web/irw.owl#>

SELECT *
WHERE {
  GRAPH <http://purl.org/twc/vocab/conversion/MetaDataset> {
    { ?dataset ov:csvRow []; ?p ?url }
  }
  filter(regex(str(?dataset),'1554') || regex(str(?url),'1554'))
}

Dataset URIs and their final data file source (results):

NOTE: It would be VERY beneficial to have a transitive/chained property to get from ?details to ?actual_source - we should not be hard-coding this path.

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX ov:         <http://open.vocab.org/terms/>
PREFIX irw:        <http://www.ontologydesignpatterns.org/ont/web/irw.owl#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT ?dataset ?actual_source
WHERE {
  GRAPH <http://purl.org/twc/vocab/conversion/MetaDataset> {
    ?dataset foaf:homepage ?details .
  }
  GRAPH <http://logd.tw.rpi.edu/source/twc-rpi-edu/dataset/data-pointer-http-headers/version/2011-Mar-05> {
    ?details  irw:redirectsTo ?raw .
    ?raw      irw:refersTo    ?download .
    ?download irw:redirectsTo ?actual_source .
  }
}

Pulling in claimed sources (results):

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX ov:         <http://open.vocab.org/terms/>
PREFIX irw:        <http://www.ontologydesignpatterns.org/ont/web/irw.owl#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
PREFIX e1:         <http://logd.tw.rpi.edu/source/data-gov/dataset/92/vocab/enhancement/1/>

SELECT ?dataset ?claimed_source ?actual_source
WHERE {
  GRAPH <http://purl.org/twc/vocab/conversion/MetaDataset> {
    ?dataset foaf:homepage ?details; e1:agency ?claimed_source .
  }
  GRAPH <http://logd.tw.rpi.edu/source/twc-rpi-edu/dataset/data-pointer-http-headers/version/2011-Mar-05> {
    ?details  irw:redirectsTo ?raw .
    ?raw      irw:refersTo    ?download .
    ?download irw:redirectsTo ?actual_source .
  }
} order by ?claimed_source ?actual_source

(DRAFT) Where the final redirects change? (results):

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX ov:         <http://open.vocab.org/terms/>
PREFIX irw:        <http://www.ontologydesignpatterns.org/ont/web/irw.owl#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT ?dataset ?actual_source1 ?actual_source2
WHERE {
  GRAPH <http://purl.org/twc/vocab/conversion/MetaDataset> {
    ?dataset foaf:homepage ?details .
  }
  GRAPH <http://logd.tw.rpi.edu/source/twc-rpi-edu/dataset/data-pointer-http-headers/version/2011-Mar-05> {
   ?details   irw:redirectsTo  ?raw1 .
   ?raw1      irw:refersTo    ?download1 .
   ?download1 irw:redirectsTo ?actual_source1 .
  }
  OPTIONAL {
    GRAPH <http://logd.tw.rpi.edu/source/twc-rpi-edu/dataset/data-pointer-http-headers/version/2011-Mar-02> {
      ?details   irw:redirectsTo ?raw2 .
      ?raw2      irw:refersTo    ?download2 .
      ?download2 irw:redirectsTo ?actual_source2 .
    }
  }
}
Clone this wiki locally