Skip to content

Generating enhancement parameters

timrdf edited this page May 24, 2012 · 41 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

See Conversion process phase: pull conversion trigger, which is where enhancement parameters are generated for you.

Introduction

Enhancement parameters are automatically generated and placed in manual/ when the initial raw conversion is performed. As described in Directory Conventions, the purpose of the manual/ directory is to hold all files that involved a human's touch. Although the enhancement parameters are automatically generated, someone needs to tweak them by asserting more than just the default [ conversion:range todo:Literal; ].

Verbatim (i.e., initial, raw) parameters

./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh

writes automatic/*.raw.params.ttl every time it performs the verbatim conversion.

Enhancement 1 parameters

./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh

writes manual/*.e1.params.ttl if they are not there.

Enhancement 2 parameters

./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh -e 2

writes manual/*.e2.params.ttl if they are not there.

Enhancement N parameters

Same as N = 2 above, but with a different value.

Authorship of enhancement parameters

When the default enhancement parameters are created, the environment variables CSV2RDF4LOD_CONVERT_MACHINE_URI and CSV2RDF4LOD_CONVERT_PERSON_URI are used to capture information about the person responsible. This can then be used to acknowledge the person's effort and calculate the impact their data curation has on subsequent data products and demonstrations. The unix command whoami is also used to describe the creatorship.

Implementation

NOTE: These implementation details are not necessary to use csv2rdf4lod-automation to convert data; they are provided here for informational purposes only.

java edu.rpi.tw.data.csv.impl.CSVHeaders <file> [headerLineNumber]

Returns the values in the first row of a CSV file -- one per line. Other rows can be returned by indicating a row number.

bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv
Reference:
FY 1999 Supplement to the President's Budget






bash-3.2$ 

The headers are actually on the fourth row of this CSV:

bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv 4
Agency

High End Computing and Computation
Large Scale Networking

High Confidence Systems
Human Centered Systems
Education, Training, & Human Resources
TOTAL

The script $CSV2RDF4LOD_HOME/bin/util/header2params2.awk can accept these headers and produce Turtle RDF file template for the enhancement parameters. $CSV2RDF4LOD_HOME/bin/util/header2params2.awk takes a handful of parameters for the source_identifier, dataset_identifier, etc. -- see the script for details.

....
@prefix ov:         <http://open.vocab.org/terms/> .
@prefix conversion: <http://purl.org/twc/vocab/conversion/> .
....

:dataset a void:Dataset;
   conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
   conversion:source_identifier  "nitrd-gov";
   conversion:dataset_identifier "federal_research_and_development_budget_for_networking_and_information_technology";
   conversion:dataset_version    "2011-Jan-27";
   conversion:conversion_process [
      a conversion:RawConversionProcess;
      conversion:enhancement_identifier "1";
      conversion:subject_discriminator  "fy99_supp_cic_r&d_budget_cross_cut";
      conversion:enhance [      
         ov:csvRow 4;
         a conversion:HeaderRow;
      ];                        
      conversion:enhance [
         ov:csvCol         1;
         ov:csvHeader     "Agency";
         conversion:label "Agency";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol         2;
         ov:csvHeader     "";
         conversion:label "";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol         3;
         ov:csvHeader     "High End Computing and Computation";
         conversion:label "High End Computing and Computation";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];

What's next?

Clone this wiki locally