Skip to content

CSV2RDF4LOD environment variables

Tim L edited this page Feb 9, 2014 · 41 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

What is first

What we will cover

This page describes why the CSV2RDF4LOD environment variables exist. It describes how to set them and how to recreate their settings. It also describes specific variables in more detail

Let's get to it!

csv2rdf4lod-automation is a set of shell scripts that support the retrieval, organization, conversion, and publishing of tabular data. csv2rdf4lod-automation invokes csv2rdf4lod, a Java jar implementing the conversion vocabulary, which specifies declarative enhancements that can be applied to tabular literals to create well-structured, highly-connected RDF representations.

When using csv2rdf4lod-automation in a unix shell, the scripts refer to a variety of CSV2RDF4LOD_ environment variables to determine what processing they should or should not do, or how they should be doing it. So, if the environment variables are not set, then the convert will not work at all or will not work the way you'd like it to.

Documentation for each environment variable

The most authoritative documentation for each of these environment variables is in the commenting of $CSV2RDF4LOD_HOME/bin/setup.sh. Although setup.sh should not be edited because it is the template for your very own my-csv2rdf4lod-source-me.sh when you [install csv2rdf4lod-automation](Installing csv2rdf4lod automation), you can edit the my-csv2rdf4lod-source-me.sh to suit your system.

Again, the values in $CSV2RDF4LOD_HOME/bin/setup.sh DO NOT INFLUENCE the automation -- only those in my-csv2rdf4lod-source-me.sh do. Edit my-csv2rdf4lod-source-me.sh and not $CSV2RDF4LOD_HOME/bin/setup.sh.

$CSV2RDF4LOD_HOME/install.sh uses $CSV2RDF4LOD_HOME/bin/setup.sh to create your my-csv2rdf4lod-source-me.sh for your system.

You edit my-csv2rdf4lod-source-me.sh; Documentation is in $CSV2RDF4LOD_HOME/bin/setup.sh.

Invoking cr-vars.sh will show all variables and either their current value or a comment about what the value will default to:

bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME                                         /Users/timrdf/Desktop/csv2rdf4lod-automation
CSV2RDF4LOD_BASE_URI                                     http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE                            (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_MACHINE_URI                          http://tw.rpi.edu/web/inside/machine/lebot_macbook#
CSV2RDF4LOD_CONVERT_PERSON_URI                           http://tw.rpi.edu/instances/TimLebo
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS                  (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY                  false
CSV2RDF4LOD_CONVERT_DUMP_FILE_EXTENSIONS                 ttl.tgz,nt
CSV2RDF4LOD_CONVERT_PROVENANCE_GRANULAR                  (will default to: false)
--
CSV2RDF4LOD_PUBLISH                                      (will default to: true)
CSV2RDF4LOD_PUBLISH_DELAY_UNTIL_ENHANCED                 true
CSV2RDF4LOD_PUBLISH_COMPRESS                             (will default to: false)
CSV2RDF4LOD_PUBLISH_OUR_SOURCE_ID                        (will not archive conversion metadata into versioned dataset.)
CSV2RDF4LOD_PUBLISH_OUR_DATASET_ID                       (will not archive conversion metadata into versioned dataset.)
CSV2RDF4LOD_PUBLISH_TTL                                  true
CSV2RDF4LOD_PUBLISH_TTL_LAYERS                           true
CSV2RDF4LOD_PUBLISH_NT                                   false
CSV2RDF4LOD_PUBLISH_RDFXML                               false
--
CSV2RDF4LOD_PUBLISH_SUBSET_VOID                          true
CSV2RDF4LOD_PUBLISH_SUBSET_VOID_NAMED_GRAPH              (will default to: auto)
CSV2RDF4LOD_PUBLISH_SUBSET_SAMEAS                        true
CSV2RDF4LOD_PUBLISH_SUBSET_SAMEAS_NAMED_GRAPH            (will default to: auto)
CSV2RDF4LOD_PUBLISH_SUBSET_SAMPLES                       false
--
CSV2RDF4LOD_PUBLISH_CONVERSION_PARAMS_NAMED_GRAPH        (will default to: auto)
--
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION                  false
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WWW_ROOT         (will default to: VVV/publish/lod-mat/)
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WRITE_FREQUENCY  (will default to: 1,000,000)
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_REPORT_FREQUENCY (will default to: 1,000)
CSV2RDF4LOD_CONCURRENCY                                  2
--
CSV2RDF4LOD_PUBLISH_TDB                                  false
CSV2RDF4LOD_PUBLISH_TDB_DIR                              (will default to: VVV/publish/tdb/)
CSV2RDF4LOD_PUBLISH_TDB_INDIV                            false
--
CSV2RDF4LOD_PUBLISH_4STORE                               false
CSV2RDF4LOD_PUBLISH_4STORE_KB                            (will default to: csv2rdf4lod -- /var/lib/4store/csv2rdf4lod)
--
CSV2RDF4LOD_PUBLISH_VIRTUOSO                             (will default to: false)
--
CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT                      (will default to: none)
CSV2RDF4LOD_PUBLISH_SPARQL_RESULTS_DIRECTORY             (will default to: none)
--
see documentation for variables in:
/Users/timrdf/Desktop/csv2rdf4lod-automation/bin/setup.sh

CSV2RDF4LOD not set

Variables also discussed on the wiki

Some variables are discussed beyond the authoritative comments in $CSV2RDF4LOD_HOME/bin/setup.sh:

Environment variables related to publishing

  • CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WWW_ROOT needs to be set to publish the dump files in each conversion cockpit's publish/ directory into your server's htdocs directory (e.g. /var/www/). No trailing slash.

See also Publishing phase.

CSV2RDF4LOD_CONVERT_ALWAYS_UPDATE_CONVERTER

This will always 'git pull' the csv2rdf4lod-automation repository.

CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT_SEPARATE_NG_PROVENANCE

See also

From conversion cockpit, conversion trigger will source the following if they exist:

../csv2rdf4lod-source-me.sh
../../csv2rdf4lod-source-me.sh

Misc.

http://code.google.com/p/data-gov-wiki/issues/detail?id=47

What is next?

  • Script: source me.sh - The one file that is used to set all of the CSV2RDF4LOD environment variables. source this file in each unix shell in which you'd like to use the converter (or, source this file in your .bashrc).
  • Further considerations for distributed environments need to be made if you are adopting csv2rdf4lod-automation as part of a team project spread across machines via version control.
Clone this wiki locally