-
Notifications
You must be signed in to change notification settings - Fork 36
SDV organization
Tim L edited this page Jul 11, 2015
·
63 revisions
SDV organization uses three aspects of a dataset ("source", "dataset", and "version") to organize:
- ... the many datasets that a source may have, and
- ... the many versions that a source may issue for a particular dataset.
Definitions for each of the three aspects:
- Source, the agent (person, organization) providing the dataset.
- Dataset, an abstract portion of all the agent’s data.
- Version, a concrete portion of an agent’s abstract dataset.
The following pages describe the basics of applying "SDV" organize others' datasets.
- Conversion process phase: name
- Conversion process phase: retrieve
- When using the file system to organize data, we use some Directory Conventions.
The following pages describe how SDV organization is used to automatically create new dataset versions.
- Automated creation of a new Versioned Dataset
- Aggregating subsets of converted datasets
- Secondary Derivative Datasets
- Triggers
When other systems are situated within SDV organization, they can leverage the naming convention in their processing and results. The following lists the parameter names used to provide the SDV aspects to the systems during invocation.
-
Invoking csv2rdf4lod was the original "SDV situated" process.
- Uses shell variable to determine converter, defaults to
edu.rpi.tw.data.csv.CSVtoRDF
(here) - Invokes it with as
$csv2rdf
with a pile of arguments (here)
- Uses shell variable to determine converter, defaults to
- Situating a FAqT Brick into csv2rdf4lod automation
-
Situating a visual strategy into csv2rdf4lod automation
-
visual-artifact-uri=
with value from "cr-dataset-uri.sh --uri"
-
- VSR's Content augmentation
-
Situating a data carver session into csv2rdf4lod automation uses the following input arguments
-
--cr-base-uri=http://ieeevis.tw.rpi.edu
(== CSV2RDF4LOD_BASE_URI) -
--cr-conversion-root=/Users/me/projects/twc-ieeevis/data/source
(--cr-data-root
is a synonym.) --cr-source-id=ieeevis-tw-rpi-edu
--cr-dataset-id=data-carves
-
--cr-version-id=experiment-1
(optional, if omitted the called system should provide a default) - Or,
--cr-source-id
,--cr-dataset-id
, and--cr-version-id
can be packed into the param: --cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/source
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/manual
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/automatic
--cr-dataset-dir=/Users/me/projects/twc-ieeevis/data/source/ieeevis-tw-rpi-edu/data-carves/version/experiment-1/source/part-1
-
- Accepting SDV params via XSL (see also XSL crib sheet)
<xsl:param name="cr-base-uri" select="'http://my.com'"/>
<xsl:param name="cr-source-id" select="'epa-gov'"/>
<xsl:param name="cr-dataset-id" select="'some-dataset'"/>
<xsl:param name="cr-version-id" select="'latest'"/>
-
<xsl:param name="cr-portion-id" select="''"/>
(rename of conversion:subject_discriminator) <xsl:variable name="abstract" select="concat($cr-base-uri,'/source/',$cr-source-id,'/dataset/',$cr-dataset-id)"/>
<xsl:variable name="sdv" select="concat($cr-base-uri,'/source/',$cr-source-id,'/dataset/',$cr-dataset-id,'/version/',$cr-version-id)"/>
- Passing SDV params to XSL:
cr-sdv.sh --attribute-value
- OPeNDAP += PROV [pingback]
-
Prov.cr_base_uri
,Prov.cr_data_root
,Prov.cr_source_id
,Prov.cr_dataset_id
,Prov.cr_dataset_dir
- https://github.com/tetherless-world/opendap/wiki/Use-case:-mockup-tracer#wiki-processing-data-from-opendap-using-http
- https://github.com/tetherless-world/opendap/wiki/OPeNDAP-PROV-Module#wiki-configuration
- Has need to specify a version-naming template.
-
-
SemantEco Annotator
- TBD
- Python
- See the snippets below.
- Invoke it with arguments using e.g.
python ../../src/json2rdf.py
cr-sdv.sh --attribute-value--$json > $ttl
(where the single quotes are back ticks...)
def lift(_json, sdv):
sdv.source = sdv.base + '/source/' + sdv.s;
sdv.abstract = sdv.base + '/source/' + sdv.s + '/dataset/' + sdv.d;
sdv.versioned = sdv.base + '/source/' + sdv.s + '/dataset/' + sdv.d + '/version/' + sdv.v;
...
if __name__ == '__main__':
HELP='https://github.com/timrdf/csv2rdf4lod-automation/wiki/SDV-organization'
parser = argparse.ArgumentParser(description='Load Database from JSON');
parser.add_argument('--cr-base-uri', dest='base', help=HELP, required=True);
parser.add_argument('--cr-source-id', dest='s', help=HELP, required=True);
parser.add_argument('--cr-dataset-id', dest='d', help=HELP, required=True);
parser.add_argument('--cr-version-id', dest='v', help=HELP, required=True);
parser.add_argument('json', nargs='+', help='Input JSON file');
args = parser.parse_args();
sdv = args;
for input in args.json:
lift(input, sdv)
cr-dataset-uri.sh --void
provides the VoID of the SDV hierarchy.
- SDV organization is described in our IPAW 2014 paper.