SDV organization

SDV organization uses three aspects of a dataset ("source", "dataset", and "version") to organize:

... the many datasets that a source may have, and
... the many versions that a source may issue for a particular dataset.

Definitions for each of the three aspects:

Source, the agent (person, organization) providing the dataset.
Dataset, an abstract portion of all the agent’s data.
Version, a concrete portion of an agent’s abstract dataset.

The following pages describe the basics of applying "SDV" organize others' datasets.

Conversion process phase: name
Conversion process phase: retrieve
When using the file system to organize data, we use some Directory Conventions.

The following pages describe how SDV organization is used to automatically create new dataset versions.

When other systems are situated within SDV organization, they can leverage the naming convention in their processing and results. The following lists the parameter names used to provide the SDV aspects to the systems during invocation.

Invoking csv2rdf4lod was the original "SDV situated" process.
- Uses shell variable to determine converter, defaults to edu.rpi.tw.data.csv.CSVtoRDF (here)
- Invokes it with as $csv2rdf with a pile of arguments (here)
Situating a FAqT Brick into csv2rdf4lod automation
Situating a visual strategy into csv2rdf4lod automation
- visual-artifact-uri= with value from "cr-dataset-uri.sh --uri"
VSR's Content augmentation
Situating a data carver session into csv2rdf4lod automation uses the following input arguments
- --cr-base-uri=http://ieeevis.tw.rpi.edu (== CSV2RDF4LOD_BASE_URI)
- --cr-conversion-root=/Users/me/projects/twc-ieeevis/data/source (--cr-data-root is a synonym.)
- --cr-source-id=ieeevis-tw-rpi-edu
- --cr-dataset-id=data-carves
- --cr-version-id=experiment-1 (optional, if omitted the called system should provide a default)
- Or, --cr-source-id, --cr-dataset-id, and --cr-version-id can be packed into the param:
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/source
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/manual
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/automatic
- --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/source/part-1
Accepting SDV params via XSL (see also XSL crib sheet)
- <xsl:param name="cr-base-uri" select="'http://my.com'"/>
- <xsl:param name="cr-source-id" select="'epa-gov'"/>
- <xsl:param name="cr-dataset-id" select="'some-dataset'"/>
- <xsl:param name="cr-version-id" select="'latest'"/>
- <xsl:param name="cr-portion-id" select="''"/> (rename of conversion:subject_discriminator)
- <xsl:variable name="abstract" select="concat($cr-base-uri,'/source/',$cr-source-id,'/dataset/',$cr-dataset-id)"/>
- <xsl:variable name="sdv" select="concat($cr-base-uri,'/source/',$cr-source-id,'/dataset/',$cr-dataset-id,'/version/',$cr-version-id)"/>
Passing SDV params to XSL:
- cr-sdv.sh --attribute-value
OPeNDAP += PROV [pingback]
- Prov.cr_base_uri, Prov.cr_data_root, Prov.cr_source_id, Prov.cr_dataset_id, Prov.cr_dataset_dir
- https://github.com/tetherless-world/opendap/wiki/Use-case:-mockup-tracer#wiki-processing-data-from-opendap-using-http
- https://github.com/tetherless-world/opendap/wiki/OPeNDAP-PROV-Module#wiki-configuration
- Has need to specify a version-naming template.
SemantEco Annotator
- TBD
Python
- See the snippets below.
- Invoke it with arguments using e.g. python ../../src/json2rdf.py cr-sdv.sh --attribute-value-- $json > $ttl (where the single quotes are back ticks...)

def lift(_json, sdv):

   sdv.source    = sdv.base + '/source/' + sdv.s;
   sdv.abstract  = sdv.base + '/source/' + sdv.s + '/dataset/' + sdv.d;
   sdv.versioned = sdv.base + '/source/' + sdv.s + '/dataset/' + sdv.d + '/version/' + sdv.v;
...

if __name__ == '__main__':

   HELP='https://github.com/timrdf/csv2rdf4lod-automation/wiki/SDV-organization'
   parser = argparse.ArgumentParser(description='Load Database from JSON');
   parser.add_argument('--cr-base-uri',   dest='base',     help=HELP, required=True);
   parser.add_argument('--cr-source-id',  dest='s',        help=HELP, required=True);
   parser.add_argument('--cr-dataset-id', dest='d',        help=HELP, required=True);
   parser.add_argument('--cr-version-id', dest='v',        help=HELP, required=True);
   parser.add_argument('json', nargs='+',                  help='Input JSON file');

   args = parser.parse_args();
   sdv = args;

   for input in args.json:
      lift(input, sdv)

Recognizing that a process is "SDV situated"

Asserting situated instances

cr-dataset-uri.sh --void provides the VoID of the SDV hierarchy.

What is next

SDV organization is described in our IPAW 2014 paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDV organization

Recognizing that a process is "SDV situated"

Asserting situated instances

What is next

Clone this wiki locally