-
Notifications
You must be signed in to change notification settings - Fork 36
CSV2RDF4LOD environment variables
- Home for an overview
- Installing csv2rdf4lod automation
- [cr-vars.sh](Script: cr vars.sh) shows the current values for all [relevant] environment variables.
This page describes why the CSV2RDF4LOD environment variables exist. It describes how to set them and how to recreate their settings. It also describes specific variables in more detail
csv2rdf4lod-automation is a set of shell scripts that support the retrieval, organization, conversion, and publishing of tabular data. csv2rdf4lod-automation invokes csv2rdf4lod, a Java jar implementing the conversion vocabulary, which specifies declarative enhancements that can be applied to tabular literals to create well-structured, highly-connected RDF representations.
When using csv2rdf4lod-automation in a unix shell, the scripts refer to a variety of CSV2RDF4LOD_
environment variables to determine what processing they should or should not do, or how they should be doing it. So, if the environment variables are not set, then the convert will not work at all or will not work the way you'd like it to.
The most authoritative documentation for each of these environment variables is in the commenting of
$CSV2RDF4LOD_HOME/bin/setup.sh. Although setup.sh
should not be edited because it is the template for your very own my-csv2rdf4lod-source-me.sh
when you [install csv2rdf4lod-automation](Installing csv2rdf4lod automation), you can edit the my-csv2rdf4lod-source-me.sh
to suit your system.
Again, the values in $CSV2RDF4LOD_HOME/bin/setup.sh DO NOT INFLUENCE the automation -- only those in my-csv2rdf4lod-source-me.sh
do. Edit my-csv2rdf4lod-source-me.sh
and not $CSV2RDF4LOD_HOME
/bin/setup.sh.
$CSV2RDF4LOD_HOME/install.sh uses $CSV2RDF4LOD_HOME/bin/setup.sh to create your my-csv2rdf4lod-source-me.sh
for your system.
You edit my-csv2rdf4lod-source-me.sh
; Documentation is in $CSV2RDF4LOD_HOME/bin/setup.sh.
Invoking cr-vars.sh will show all variables and either their current value or a comment about what the value will default to:
bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME /Users/timrdf/Desktop/csv2rdf4lod-automation
CSV2RDF4LOD_BASE_URI http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_MACHINE_URI http://tw.rpi.edu/web/inside/machine/lebot_macbook#
CSV2RDF4LOD_CONVERT_PERSON_URI http://tw.rpi.edu/instances/TimLebo
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY false
CSV2RDF4LOD_CONVERT_DUMP_FILE_EXTENSIONS ttl.tgz,nt
CSV2RDF4LOD_CONVERT_PROVENANCE_GRANULAR (will default to: false)
--
CSV2RDF4LOD_PUBLISH (will default to: true)
CSV2RDF4LOD_PUBLISH_DELAY_UNTIL_ENHANCED true
CSV2RDF4LOD_PUBLISH_COMPRESS (will default to: false)
CSV2RDF4LOD_PUBLISH_OUR_SOURCE_ID (will not archive conversion metadata into versioned dataset.)
CSV2RDF4LOD_PUBLISH_OUR_DATASET_ID (will not archive conversion metadata into versioned dataset.)
CSV2RDF4LOD_PUBLISH_TTL true
CSV2RDF4LOD_PUBLISH_TTL_LAYERS true
CSV2RDF4LOD_PUBLISH_NT false
CSV2RDF4LOD_PUBLISH_RDFXML false
--
CSV2RDF4LOD_PUBLISH_SUBSET_VOID true
CSV2RDF4LOD_PUBLISH_SUBSET_VOID_NAMED_GRAPH (will default to: auto)
CSV2RDF4LOD_PUBLISH_SUBSET_SAMEAS true
CSV2RDF4LOD_PUBLISH_SUBSET_SAMEAS_NAMED_GRAPH (will default to: auto)
CSV2RDF4LOD_PUBLISH_SUBSET_SAMPLES false
--
CSV2RDF4LOD_PUBLISH_CONVERSION_PARAMS_NAMED_GRAPH (will default to: auto)
--
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION false
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WWW_ROOT (will default to: VVV/publish/lod-mat/)
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WRITE_FREQUENCY (will default to: 1,000,000)
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_REPORT_FREQUENCY (will default to: 1,000)
CSV2RDF4LOD_CONCURRENCY 2
--
CSV2RDF4LOD_PUBLISH_TDB false
CSV2RDF4LOD_PUBLISH_TDB_DIR (will default to: VVV/publish/tdb/)
CSV2RDF4LOD_PUBLISH_TDB_INDIV false
--
CSV2RDF4LOD_PUBLISH_4STORE false
CSV2RDF4LOD_PUBLISH_4STORE_KB (will default to: csv2rdf4lod -- /var/lib/4store/csv2rdf4lod)
--
CSV2RDF4LOD_PUBLISH_VIRTUOSO (will default to: false)
--
CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT (will default to: none)
CSV2RDF4LOD_PUBLISH_SPARQL_RESULTS_DIRECTORY (will default to: none)
--
see documentation for variables in:
/Users/timrdf/Desktop/csv2rdf4lod-automation/bin/setup.sh
Some variables are discussed beyond the authoritative comments in $CSV2RDF4LOD_HOME/bin/setup.sh:
-
CSV2RDF4LOD_RETRIEVE_DROID_SOURCES
specifies whether or not to DROID the files in a conversion cockpit's source/ directory. (see this) -
CSV2RDF4LOD_CONVERT_DUMP_FILE_EXTENSIONS
at Conversion process phase: publish
- CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WWW_ROOT needs to be set to publish the dump files in each conversion cockpit's
publish/
directory into your server's htdocs directory (e.g./var/www/
). No trailing slash.
See also Publishing phase.
This will always 'git pull' the csv2rdf4lod-automation repository.
From conversion cockpit, conversion trigger will source the following if they exist:
../csv2rdf4lod-source-me.sh
../../csv2rdf4lod-source-me.sh
- Script: source me.sh - the canonical script to set all CSV2RDF4LOD environment variables.
- Further considerations for distributed environments - for more advance uses of the environment variables when working with multiple systems, projects, and people. Discusses cr-where-was-envvar-set.sh to find out which source-me set a given variable.
- Reusing enhancement parameters for multiple versions or datasets
http://code.google.com/p/data-gov-wiki/issues/detail?id=47
-
Script: source me.sh - The one file that is used to set all of the CSV2RDF4LOD environment variables.
source
this file in each unix shell in which you'd like to use the converter (or,source
this file in your.bashrc
). - Further considerations for distributed environments need to be made if you are adopting csv2rdf4lod-automation as part of a team project spread across machines via version control.