Skip to content

conversion:Enhancement

Tim L edited this page Sep 6, 2013 · 92 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

csv2rdf4lod automates the invocation of a converter that is controlled using explicit specifications described by a conversion vocabulary whose namespace is http://purl.org/twc/vocab/conversion/. This is done to minimize human error, increase consistency and quality of the resulting RDF representation, and enable transparency and accessibility using provenance. It also provides interpretation metadata that can be efficiently queried to increase discoverability and [reapplied](Reusing enhancement parameters for multiple versions or datasets) to other datasets that share similar structures.

These enhancement parameters have annealed throughout the past four years as it has been applied to hand-curate 100s of datasets from dozens of source organizations, and our experience has indicated that a small set of RDFS-inspired principles have broad applicability to wide structural variability and provide significant advantage for both curator and subsequent data consumer.

Enhancements in "SPO" order

Although order does not matter for the predicates on a conversion:Enhancement, the following order is suggested to align with the order in which they affect the triple asserted. Note that this is FULLY conceptual; order of enhancement specification (and application) is irrelevant. This order could be used when designing a GUI for constructing the enhancement parameters.


  • ov:csvCol - referencing the column that this enhancement affects.
  • conversion:fromCol / conversion:toCol - shorthand for referencing more than one column on a single enhancement.
  • conversion:property_name - an alternative way to reference the column via its resulting predicate's local name. NOTE: not implemented.

  • ov:csvHeader - PURELY an (OWL) annotation property; a "poor-man's provenance" retrieved from the CSV header to aid identification between the 1) original data file, 2) the enhancements modifying it, and 3) its resulting instance data. The converter does not look for this nor does it behave differently with our without this value or when this value changes. It only exists for human reference. See conversion:label.

Enhancements that affect the subject of the triple produced:


Enhancements that affect the predicate of the triple produced:

  • conversion:equivalent_property - used to specify an external URI for this column, without the local-external redundancy of subproperty_of.
  • conversion:label - will become the rdfs:label of the predicate created for the triples instantiated by the current column.
  • conversion:comment - will become the rdfs:comment of the predicate created for the triples instantiated by the current column.
  • conversion:subproperty_of - identities additional predicates to use for the triples instantiated by the current column.

Enhancements that affect the object of the triple produced:


Enhancements that affect the descriptions of the object of the triple produced:

  • conversion:range - {rdfs:Literal, rdfs:Resource, xsd:integer, xsd:decimal}
  • a conversion:Unlabeled - suppresses rdfs:labels on resources promoted from a particular column.
  • conversion:multiplier - for any numeric conversion:range
  • conversion:range_name - to specify a class name to type resources promoted from cell values. A local class URI is constructed from this label.
  • conversion:links_via - cites lod-link graphs that can be used to assert owl:sameAs from the subject or object of a triple created during conversion.
  • conversion:predicate - (when ov:csvCol > 0) provides an arbitrary predicate for an additional description of the resource object created.
  • conversion:object - (when ov:csvCol > 0) provides an arbitrary object for an additional description of the resource object created.
  • conversion:object_label_property - specifies additional properties to assert for the label of a promoted resource object (in addition to the rdfs:label and dcterms:identifier).

Enhancements by function

Notes

conversion:includes should be placed in the position corresponding to the types of enhancements it is including. For example, if it is including conversion:symbol/conversion:interpretation pairs, place it at the position that conversion:interpret would go.

Enhancement parameters is a less organized listing of the same enhancements shown here.

Historical note

  • Enhancement Parameters Reference was the original one-stop-shop for the enhancements that could be performed, but the wikimedia syntax isn't too happy on github. So, I'm breaking it up into this page and pages for each enhancement.
Clone this wiki locally