-
Notifications
You must be signed in to change notification settings - Fork 36
conversion:Enhancement
csv2rdf4lod automates the invocation of a converter that is controlled using explicit specifications described by a conversion vocabulary whose namespace is http://purl.org/twc/vocab/conversion/. This is done to minimize human error, increase consistency and quality of the resulting RDF representation, and enable transparency and accessibility using provenance. It also provides interpretation metadata that can be efficiently queried to increase discoverability and [reapplied](Reusing enhancement parameters for multiple versions or datasets) to other datasets that share similar structures.
These enhancement parameters have annealed throughout the past two years as it has been applied to hand-curate 100s of datasets from dozens of source organizations, and our experience has indicated that a small set of RDFS-inspired principles have broad applicability to wide structural variability and provide significant advantage for both curator and subsequent data consumer.
Although order does not matter for the predicates on a conversion:Enhancement, the following order is suggested to align with the order in which they affect the triple asserted. Note that this is FULLY conceptual; order of enhancement specification (and application) is irrelevant. This order could be used when designing a GUI for constructing the enhancement parameters.
-
ov:csvCol
- referencing the column that this enhancement affects. -
conversion:property_name
- an alternative way to reference the column via its resulting predicate's local name. NOTE: not implemented.
-
ov:csvHeader
- PURELY an (OWL) annotation property; a "poor-man's provenance" retrieved from the CSV header to aid identification between the 1) original data file, 2) the enhancements modifying it, and 3) its resulting instance data. The converter does not look for this nor does it behave differently with our without this value or when this value changes. It only exists for human reference. See conversion:label.
Enhancements that affect the subject of the triple produced:
-
a conversion:
{Omitted, Only_if_column, DataStartRow} - aborts the assertion of a triple 1) always or 2) if the cell in the current column is empty, respectively. -
a conversion:
{ExampleResource,SubjectAnnotation} - flags a row as containing an exemplary resource that should become a void:exampleResource, TODO describe subject annotation. - conversion:bundled_by - changes the "location" from which to draw the subject of the triples instantiated by the current column.
- conversion:domain_template - changes the URI used to name the subject (see DEPRECATING: conversion:domain_template).
- conversion:domain_name - names the rdfs:label of the rdf:type of the subject; the class URI is constructed from this label.
-
conversion:predicate - (when
ov:csvCol 0
) provides an arbitrary predicate for an additional description of the resource subject created. -
conversion:object - (when
ov:csvCol 0
) provides an arbitrary object for an additional description of the resource subject created. - conversion:object_search - annotates the subject by searching the object literal.
Enhancements that affect the predicate of the triple produced:
- conversion:equivalent_property - used to specify an external URI for this column, without the local-external redundancy of subproperty_of.
- conversion:label - will become the rdfs:label of the predicate created for the triples instantiated by the current column.
- conversion:comment - will become the rdfs:comment of the predicate created for the triples instantiated by the current column.
- conversion:subproperty_of - identities additional predicates to use for the triples instantiated by the current column.
Enhancements that affect the object of the triple produced:
-
a conversion:
{LargeValue} - conversion:eg - gives an example value from a cell in the column. Present only for human reference.
-
a conversion:
{Repeat_previous_if_empty_column} - - conversion:interpret
-
conversion:
{date_pattern, datetime_pattern} - conversion:delimits_object - specifies a delimiter regex to parse the input value into multiple objects.
- conversion:object - provides the template for the up-value in cell based conversions.
- conversion:range_template - changes the name of the object.
Enhancements that affect the descriptions of the object of the triple produced:
- conversion:range - {rdfs:Literal, rdfs:Resource, xsd:integer, xsd:decimal}
- conversion:multiplier - for any numeric conversion:range
- conversion:range_name - names the rdfs:label of the rdf:type of the object; the class URI is constructed from this label.
- conversion:links_via - cites lod-link files that can be used to assert owl:sameAs from the subject or object.
- conversion:subject_of - identifies the predicate in the lod-link file that should behave as a owl:InverseFunctionalProperty.
-
conversion:predicate - (when
ov:csvCol
>0
) provides an arbitrary predicate for an additional description of the resource object created. -
conversion:object - (when
ov:csvCol
>0
) provides an arbitrary object for an additional description of the resource object created.
- conversion:class_name - cites label of a local class created by another enhancement to become rdfs:subClassOf of that cited by subclass_of.
- conversion:subclass_of - the URI or template citing the superclass of a local class.
- Structural conversion:Enhancements
- Enhancements that modify the Subject of a triple
- Enhancements that modify the Predicate of a triple
- Enhancements that modify the Object of a triple
- Vocabulary conversion:Enhancements
- Enhancements that provide owl:sameAs triples
conversion:includes should be placed in the position corresponding to the types of enhancements it is including. For example, if it is including conversion:symbol/conversion:interpretation pairs, place it at the position that conversion:interpret would go.
Enhancement parameters is a less organized listing of the same enhancements shown here.
- Enhancement Parameters Reference was the original one-stop-shop for the enhancements that could be performed, but the wikimedia syntax isn't too happy on github. So, I'm breaking it up into this page and pages for each enhancement.