Skip to content

Using csv2rdf4lod automation without csv2rdf4lod

timrdf edited this page May 31, 2012 · 11 revisions

While csv2rdf4lod is the Java converter that transforms tabular data into RDF according to enhancement parameters described using the conversion vocabulary, csv2rdf4lod-automation is the set of shell script utilities that setup the directory structure, invoke csv2rdf4lod, and publish the results to /var/www.

csv2rdf4lod-automation can be used while replacing csv2rdf4lod with our own tabular converter. You might want to do this if your conversion is so out of whack that csv2rdf4lod's "RDFS-like paradigm" doesn't suit your needs. I've seen this twice in the thousands of datasets that I've helped people convert, and to be honest, I don't think their objectives were well designed.

Anyhoo, we should still be able to give you (most of) the provenance for free.

Your converter must be invokable from the command line. It must have required dependancies installed. The only interface between csv2rdf4lod-automation and the converter is the arguments that it feeds to it, and how they are recognized.

The signature is a bit unwieldy (sorry!):

$csv2rdf $data $prov $sampleN -ep $destDir/$datafile.raw.params.ttl $overrideBaseURI $dumpExtensions \
   -w $destDir/$datafile.raw.sample.ttl -id $converterJarMD5 2>&1 | tee -a $CSV2RDF4LOD_LOG

The important bits:

  • Printing to stderr will be captured to a log.
  • $data is the input tablular file.
  • The thing after the -w is the output file; if you don't find one, then dump your output to stdout.
  • -ep is the parameters to your converter. Change how it behaves based on the contents of this file.

csv2rdf4lod-automation assumes that the converter's output is Turtle.

Return value 3 if you have a parse error on the conversion parameters.

Your converter gets invoked for 1) data conversion, 2) provenance of that data conversion, 3) sample data conversion, 4) provenance of the sample data conversion. To avoid 3 and 4, turn off CSV2RDF4LOD_CONVERT_SAMPLE_SUBSET_ONLY=false and CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY=false. Otherwise, you'd need to do more converter implementation...

Also, turn CSV2RDF4LOD_CONVERT_PROVENANCE_FRBR=false if you want to recognize the -prov flag as the "do just provenance invocation signature".

Make the following exist:

mkdir source/SSS/DDD/lib/my.jar

be able to run the following from your conversion cockpit:

java edu.rpi.tw.eScience.WaterQualityPortal.oboe.OBOEAgent source/my.csv -w automatic/my.csv.e1.ttl

and have that (and all of its dependencies) on your CLASSPATH envvar.

then

cd source/SSS/DDD/version/VVV/
export CSV2RDF4LOD_CONVERTER="java edu.rpi.tw.eScience.WaterQualityPortal.oboe.OBOEAgent"
./convert*.sh

You can put the export (and any other path settings) into source/SSS/DDD/version/csv2rdf4lod-source-me.sh (as described at Reusing enhancement parameters for multiple versions or datasets).

Clone this wiki locally