Skip to content

SDV organization

Tim L edited this page Jul 11, 2015 · 63 revisions

SDV organization uses three aspects of a dataset ("source", "dataset", and "version") to organize:

  • ... the many datasets that a source may have, and
  • ... the many versions that a source may issue for a particular dataset.

Definitions for each of the three aspects:

  • Source, the agent (person, organization) providing the dataset.
  • Dataset, an abstract portion of all the agent’s data.
  • Version, a concrete portion of an agent’s abstract dataset.

The following pages describe the basics of applying "SDV" organize others' datasets.

The following pages describe how SDV organization is used to automatically create new dataset versions.

When other systems are situated within SDV organization, they can leverage the naming convention in their processing and results. The following lists the parameter names used to provide the SDV aspects to the systems during invocation.

  • Invoking csv2rdf4lod was the original "SDV situated" process.
    • Uses shell variable to determine converter, defaults to edu.rpi.tw.data.csv.CSVtoRDF (here)
    • Invokes it with as $csv2rdf with a pile of arguments (here)
  • Situating a FAqT Brick into csv2rdf4lod automation
  • Situating a visual strategy into csv2rdf4lod automation
  • VSR's Content augmentation
  • Situating a data carver session into csv2rdf4lod automation uses the following input arguments
    • --cr-base-uri=http://ieeevis.tw.rpi.edu (== CSV2RDF4LOD_BASE_URI)
    • --cr-conversion-root=/Users/me/projects/twc-ieeevis/data/source (--cr-data-root is a synonym.)
    • --cr-source-id=ieeevis-tw-rpi-edu
    • --cr-dataset-id=data-carves
    • --cr-version-id=experiment-1 (optional, if omitted the called system should provide a default)
    • Or, --cr-source-id, --cr-dataset-id, and --cr-version-id can be packed into the param:
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/source
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/manual
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/automatic
    • --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/source/part-1
  • Accepting SDV params via XSL (see also XSL crib sheet)
    • <xsl:param name="cr-base-uri" select="'http://my.com'"/>
    • <xsl:param name="cr-source-id" select="'epa-gov'"/>
    • <xsl:param name="cr-dataset-id" select="'some-dataset'"/>
    • <xsl:param name="cr-version-id" select="'latest'"/>
    • <xsl:param name="cr-portion-id" select="''"/> (rename of conversion:subject_discriminator)
    • <xsl:variable name="abstract" select="concat($cr-base-uri,'/source/',$cr-source-id,'/dataset/',$cr-dataset-id)"/>
    • <xsl:variable name="sdv" select="concat($cr-base-uri,'/source/',$cr-source-id,'/dataset/',$cr-dataset-id,'/version/',$cr-version-id)"/>
  • Passing SDV params to XSL:
    • cr-sdv.sh --attribute-value
  • OPeNDAP += PROV [pingback]
  • SemantEco Annotator
    • TBD
  • Python
    • See the snippets below.
    • Invoke it with arguments using e.g. python ../../src/json2rdf.py cr-sdv.sh --attribute-value-- $json > $ttl (where the single quotes are back ticks...)
def lift(_json, sdv):

   sdv.source    = sdv.base + '/source/' + sdv.s;
   sdv.abstract  = sdv.base + '/source/' + sdv.s + '/dataset/' + sdv.d;
   sdv.versioned = sdv.base + '/source/' + sdv.s + '/dataset/' + sdv.d + '/version/' + sdv.v;
...

if __name__ == '__main__':

   HELP='https://github.com/timrdf/csv2rdf4lod-automation/wiki/SDV-organization'
   parser = argparse.ArgumentParser(description='Load Database from JSON');
   parser.add_argument('--cr-base-uri',   dest='base',     help=HELP, required=True);
   parser.add_argument('--cr-source-id',  dest='s',        help=HELP, required=True);
   parser.add_argument('--cr-dataset-id', dest='d',        help=HELP, required=True);
   parser.add_argument('--cr-version-id', dest='v',        help=HELP, required=True);
   parser.add_argument('json', nargs='+',                  help='Input JSON file');

   args = parser.parse_args();
   sdv = args;

   for input in args.json:
      lift(input, sdv)

Recognizing that a process is "SDV situated"

Asserting situated instances

cr-dataset-uri.sh --void provides the VoID of the SDV hierarchy.

What is next

Clone this wiki locally