Skip to content

Generating a sample conversion using only a subset of data

Timothy Lebo edited this page Feb 14, 2012 · 27 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

When developing enhancement parameters, it is helpful to see the results as they are added. This iterative process can be sped up by converting only a portion of a large CSV. Since a sample subset is already created as part of the conversion,

~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ l automatic/a*
-rw-r--r--  1 lebot  staff      18904 Dec 16 17:33 automatic/avian.txt.csv.raw.void.ttl
-rw-r--r--  1 lebot  staff  158321259 Dec 16 17:33 automatic/avian.txt.csv.raw.ttl
-rw-r--r--  1 lebot  staff      44692 Dec 16 17:32 automatic/avian.txt.csv.raw.sample.ttl   <- Samples are automatic.
-rw-r--r--  1 lebot  staff        776 Dec 16 17:31 automatic/avian.txt.csv.raw.params.ttl

all that we need to do is turn off the "full" conversion using the CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY environment variable.

First, check to see what its current value is (false):

bash-3.2$ cr-vars.sh 
--
CSV2RDF4LOD_HOME                                         ~/Desktop/csv2rdf4lod-automation
CSV2RDF4LOD_BASE_URI                                     http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE                            (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS                  (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY                  false
...

Then turn on the "subset only" feature:

bash-3.2$ export CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY="true"

When running the enhancement:

~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ ./convert-animal-surveillance.sh

Only the sample will be produced:

~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ l automatic/a*
-rw-r--r--  1 lebot  staff      77646 Jan 28 08:27 automatic/avian.txt.csv.e1.sample.ttl   <- Only the sample is produced.
-rw-r--r--  1 lebot  staff        776 Jan 28 08:27 automatic/avian.txt.csv.raw.params.ttl
-rw-r--r--  1 lebot  staff      18904 Dec 16 17:33 automatic/avian.txt.csv.raw.void.ttl
-rw-r--r--  1 lebot  staff  158321259 Dec 16 17:33 automatic/avian.txt.csv.raw.ttl
-rw-r--r--  1 lebot  staff      44692 Dec 16 17:32 automatic/avian.txt.csv.raw.sample.ttl

Changing the number of samples

As shown by [cr-vars.sh](Script: cr-vars.sh) above, only two samples are created by default. This can be changed by CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS:

export CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS="10"

For a description of the difference among samples and examples, see Examples versus Samples.

(NOTE: EXAMPLE here is misleading and should be changed to SAMPLE)

See also

Clone this wiki locally