-
Notifications
You must be signed in to change notification settings - Fork 36
Generating a sample conversion using only a subset of data
When developing enhancement parameters, it is helpful to see the results as they are added. This iterative process can be sped up by converting only a portion of a large CSV. Since a sample subset is already created as part of the conversion,
~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ l automatic/a*
-rw-r--r-- 1 lebot staff 18904 Dec 16 17:33 automatic/avian.txt.csv.raw.void.ttl
-rw-r--r-- 1 lebot staff 158321259 Dec 16 17:33 automatic/avian.txt.csv.raw.ttl
-rw-r--r-- 1 lebot staff 44692 Dec 16 17:32 automatic/avian.txt.csv.raw.sample.ttl <- Samples are automatic.
-rw-r--r-- 1 lebot staff 776 Dec 16 17:31 automatic/avian.txt.csv.raw.params.ttl
all that we need to do is turn off the "full" conversion using the CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY environment variable.
First, check to see what its current value is (false
):
bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME ~/Desktop/csv2rdf4lod-automation
CSV2RDF4LOD_BASE_URI http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY false
...
Then turn on the "subset only" feature:
bash-3.2$ export CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY="true"
When running the enhancement:
~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ ./convert-animal-surveillance.sh
Only the sample will be produced:
~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ l automatic/a*
-rw-r--r-- 1 lebot staff 77646 Jan 28 08:27 automatic/avian.txt.csv.e1.sample.ttl <- Only the sample is produced.
-rw-r--r-- 1 lebot staff 776 Jan 28 08:27 automatic/avian.txt.csv.raw.params.ttl
-rw-r--r-- 1 lebot staff 18904 Dec 16 17:33 automatic/avian.txt.csv.raw.void.ttl
-rw-r--r-- 1 lebot staff 158321259 Dec 16 17:33 automatic/avian.txt.csv.raw.ttl
-rw-r--r-- 1 lebot staff 44692 Dec 16 17:32 automatic/avian.txt.csv.raw.sample.ttl
As shown by [cr-vars.sh](Script: cr-vars.sh) above, only two samples are created by default. This can be changed by CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS
:
export CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS="10"
For a description of the difference among samples and examples, see Examples versus Samples.
(NOTE: EXAMPLE here is misleading and should be changed to SAMPLE)