Skip to content

Example: EPA's Enforcement & Compliance History Online (ECHO)

Timothy Lebo edited this page Feb 14, 2012 · 94 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

See Examples.

Actual use case involving Rhode Island having sick people in area. Initial prototype Fall 2010 by Evan Patton, Ping Wang, and Jin Zheng. Scoped data to Rhode Island and used custom converter. Developed an ontology. Extended Spring 2011 by Ping Wang, Tim Lebo, and Jin Zheng to include more regions and adoption of csv2rdf4lod. This page provides a good overview of what the ECHO system provides (and the visual presentations they use).

  • Steps D1.1 through D1.8 name, retrieve, and perform naive conversion for the ECHO facilities.
  • Steps D2.1 through D2.7 name, retrieve, and perform naive conversion for the ECHO measurements at the facilities.
  • Steps D1.9 through ?? enhance the naive conversion of ECHO facilities with a cell-based subjects conversion.
  • Steps D2.8 through ?? enhance the naive conversion of measurements at ECHO facilities with a cell-based subjects conversion.

Step D1.1: Identifying the source organization

(see naming phase)

Given nothing but a web page for the data (http://www.epa-echo.gov/echo/), we need to figure out who is providing it.

epa-gov

Step D1.2: Identifying the source organization's dataset identifiers

(see naming phase)

enforcement-and-compliance-history-online-echo-facilities

Following naming conventions, including acronym expansion as well as acronym. Implicit sub-dataset for different structures (done to apply two global interpretations to their corresponding structures -- artifact of global params application by automation) -- can be encoded explicitly with skos:broader later and when needed.

Step D1.3: Identifying the dataset version

(see naming phase)

bash-3.2$ mkdir -p source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
bash-3.2$ cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
bash-3.2$ urldate.sh http://www.epa-echo.gov/ideadownloads/2010/ICIS_NPDES.zip
2010-Sep-09

Step D1.4: Retrieving the source data

(see retrieving phase)

bash-3.2$ mkdir -p version/2010-Sep-09/source/
bash-3.2$ cd version/2010-Sep-09/source/
bash-3.2$ pcurl.sh http://www.epa-echo.gov/ideadownloads/2010/ICIS_NPDES.zip

Step D1.5: Uncompressing with provenance

bash-3.2$ punzip.sh ICIS_NPDES.zip

---------------------------------- punzip ---------------------------------------
ICP10.TXT came from ICIS_NPDES.zip
ICP01.TXT came from ICIS_NPDES.zip
ICP01A.TXT came from ICIS_NPDES.zip
ICP02.TXT came from ICIS_NPDES.zip
ICP03.TXT came from ICIS_NPDES.zip
ICP04.TXT came from ICIS_NPDES.zip
ICP05.TXT came from ICIS_NPDES.zip
ICP06.TXT came from ICIS_NPDES.zip
ICP07.TXT came from ICIS_NPDES.zip
ICP08.TXT came from ICIS_NPDES.zip
ICP09.TXT came from ICIS_NPDES.zip
--------------------------------------------------------------------------------

Step D1.6: Getting (and staying) in the conversion cockpit

cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/version/2010-Sep-09

Step D1.7: Creating the conversion trigger

(see create trigger phase)

Setting the cell delimiter to pipes (|).

bash-3.2$ ls
source

bash-3.2$ cr-create-convert-sh.sh -w --delimiter \| source/ICP01.TXT

bash-3.2$ ls -lt
total 8
-rwxr-xr-x   1 lebot  staff  2164 Apr  4 15:04 convert-enforcement-and-compliance-history-online-echo-facilities.sh
drwxr-xr-x  26 lebot  staff   884 Apr  4 15:04 source

Step D1.8: Pulling the conversion trigger

(see pull trigger phase)

bash-3.2$ ./convert-enforcement-and-compliance-history-online-echo-facilities.sh

Step D2.1: Identifying the source organization

(see naming phase)

Same as D1.1; epa-gov.

Step D2.2: Identifying the source organization's dataset identifiers

(see naming phase)

enforcement-and-compliance-history-online-echo-measurements
bash-3.2$ mkdir -p source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
bash-3.2$ cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/

Step D2.3: Identifying the dataset version

Following naming conventions, using retrieval date: 2011-Apr-04.

bash-3.2$ mkdir version/2011-Apr-04/
bash-3.2$ cd 2011-Apr-04

Step D2.4: Retrieving the source data

(see retrieving phase and pcurl.sh)

bash-3.2$ cd source/
bash-3.2$ pcurl.sh http://www.epa-echo.gov/cgi-bin/effluentdata.cgi \
         -F "permit=NY0261343" -F "hits=1" -n NY0261343 -e csv

Step D2.5: Getting (and staying) in the conversion cockpit

bash-3.2$ cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/version/2011-Apr-04/

Step D2.6: Creating the conversion trigger

(see create trigger phase)

bash-3.2$ ls
source
bash-3.2$ cr-create-convert-sh.sh -w source/NY0261343.csv
bash-3.2$ ls -lt
total 8
-rwxr-xr-x  1 lebot  staff  2178 Apr  4 23:00 convert-enforcement-and-compliance-history-online-echo-measurements.sh
drwxr-xr-x  4 lebot  staff   136 Apr  4 22:58 source

Step D2.7: Pulling the conversion trigger

(see pull trigger phase)

bash-3.2$ ./convert-enforcement-and-compliance-history-online-echo-measurements.sh

Step D1.9 Enhance facilities


Added:

   conversion:conversion_process [
      conversion:interpret [          
         conversion:symbol        "";
         conversion:interpretation conversion:null; 
      ];

(Got rid of triples with empty string object value "").


Added:

      conversion:enhance [
         ov:csvCol          1;
         conversion:range   rdfs:Resource;
         conversion:range_name "Permit";

Got:

@prefix typed_permit: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/typed/permit/> .

:thing_2 
  e1:permitid typed_permit:NY0261343 .

typed_permit:NY0261343 dcterms:identifier "NY0261343" ;
   a local_vocab:Permit ;
   rdfs:label "NY0261343" .

Added:

      conversion:enhance [
         ov:csvCol          2;  
         ov:csvHeader       "DB";
         conversion:subproperty_of dcterms:source;
         conversion:range   rdfs:Resource;
         conversion:range_name "Database";
      ];

Got:

@prefix typed_database: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/typed/database/> .

:thing_2
   e1:db typed_database:ICIS-NPDES .

typed_database:ICIS-NPDES 
   dcterms:identifier "ICIS-NPDES" ;
   a local_vocab:Database ;
   rdfs:label "ICIS-NPDES" .

Added:

:address_bundle
   a conversion:ImplicitBundle;
   conversion:property_name con:address;
   conversion:type_name     con:Address;
.

   conversion:conversion_process [
      conversion:enhance [
         ov:csvCol          4;
         ov:csvHeader       "Address";
         conversion:bundled_by :address_bundle;
         conversion:range   todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol          5;
         ov:csvHeader       "City";
         conversion:bundled_by :address_bundle;
         conversion:range   todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol          6;
         ov:csvHeader       "State";
         conversion:bundled_by :address_bundle;
         conversion:range   todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol          7;
         ov:csvHeader       "ZIP";
         conversion:bundled_by :address_bundle;

Got:

:thing_2
   con:address implicit_address:address_2 ;

implicit_address:address_2 a local_vocab:Address ;
   e1:address "3992 NY ROUTE 2" ;
   e1:city "TROY" ;
   e1:state "NY" ;
   e1:zip "12180" .

Added:

@prefix con:        <http://www.w3.org/2000/10/swap/pim/contact#> .

      conversion:enhance [
         ov:csvCol          4;
         ov:csvHeader       "Address";
         conversion:equivalent_property con:street;
      ];
      conversion:enhance [
         ov:csvCol          5;
         ov:csvHeader       "City";
         conversion:equivalent_property con:city;
      ];
      conversion:enhance [
         ov:csvCol          6;
         ov:csvHeader       "State";
         conversion:equivalent_property con:stateOrProvince;
      ];
      conversion:enhance [
         ov:csvCol          7;
         ov:csvHeader       "ZIP";
         conversion:equivalent_property con:zip;

Got:

implicit_address:address_2 a local_vocab:Address ;
   con:street          "3992 NY ROUTE 2" ;
   con:city            "TROY" ;
   con:stateOrProvince "NY" ;
   con:zip             "12180" .

Added:

      conversion:enhance [
         ov:csvCol          6;
         ov:csvHeader       "State";
         conversion:bundled_by :address_bundle;
         conversion:equivalent_property con:stateOrProvince;
         conversion:links_via <http://www.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl>,
                              <http://www.rpi.edu/~lebot/lod-links/state-fips-geonames.ttl>,
                              <http://www.rpi.edu/~lebot/lod-links/state-fips-govtrack.ttl>;
         conversion:subject_of dcterms:identifier;
         conversion:range   rdfs:Resource;
      ];

Got:

@prefix local_vocab: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/vocab/> .

@prefix govtrackusgov: <http://www.rdfabout.com/rdf/usgov/geo/us/> .
@prefix dbpedia: <http://dbpedia.org/resource/> .

implicit_address:address_2 
   a local_vocab:Address ;
   con:street "3992 NY ROUTE 2" ;
   con:city   "TROY" ;
   con:stateOrProvince typed_state:NY .

typed_state:NY 
   dcterms:identifier "NY" ;
   a local_vocab:State ;
   rdfs:label "NY" ;
   owl:sameAs <http://sws.geonames.org/5128638/> , 
              govtrackusgov:NY , 
              dbpedia:New_York .

Added:

      conversion:enhance [
         ov:csvCol          1;
         ov:csvHeader       "PermitID";
         conversion:label   "Checks conformance of";

Got:

:thing_2 
   e1:checks_conformance_of typed_permit:NY0261343 ;

Added:

      conversion:enhance [
         ov:csvCol          18;
         ov:csvHeader       "DATE";
         conversion:equivalent_property dcterms:date;
         conversion:eg           "20071031";
         conversion:date_pattern "yyyyMMdd";
         conversion:range        xsd:date;
      ];

Got:

:thing_2
   dcterms:date "2007-10-31"^^xsd:date ;

Added:

      conversion:enhance [
         ov:csvCol          20;
         ov:csvHeader       "C1_VALUE";

         a scovo:Item;
         conversion:label   "Test Type";
         conversion:object "[/sd]typed/test/C1";

         conversion:comment "";
         conversion:range   xsd:decimal;
      ];

      conversion:enhance [
         ov:csvCol          31;
         ov:csvHeader       "C2_VALUE";
         
         a scovo:Item;
         conversion:label   "Test Type";
         conversion:object "[/sd]typed/test/C2";

         conversion:comment "";
         conversion:range   xsd:decimal;
      ];

      conversion:enhance [
         ov:csvCol          42;
         ov:csvHeader       "C3_VALUE";

         a scovo:Item;
         conversion:label   "Test Type";
         conversion:object "[/sd]typed/test/C3";
         
         conversion:comment "";
         conversion:range   xsd:decimal;
      ];

      conversion:enhance [
         ov:csvCol          53;
         ov:csvHeader       "Q1_VALUE";

         a scovo:Item;
         conversion:label   "Test Type";
         conversion:object "[/sd]typed/test/Q1";
         
         conversion:comment "";
         conversion:range   xsd:decimal;
      ];

      conversion:enhance [
         ov:csvCol          64;
         ov:csvHeader       "Q2_VALUE";

         a scovo:Item;
         conversion:label   "Test Type";
         conversion:object "[/sd]typed/test/Q2";

         conversion:comment "";
         conversion:range   xsd:decimal;
      ];

Got:

todo

Added:

(the bundled_bys)

Got:

:thing_2_53
   e1:checks_conformance_of typed_permit:NY0261343 ;
   e1:db typed_database:ICIS-NPDES ;
   dcterms:source typed_database:ICIS-NPDES ;
   e1:name "BRUNSWICK CENTRAL SCHOOL DIST" ;
   con:address implicit_address:address_2 ;
   e1:status "Effective" ;
   e1:ownership "City government" ;
   e1:pipe "001" ;
   e1:paramtr "00056" ;
   e1:name_2 "Flow rate" ;
   e1:monlocn "1" ;
   e1:name_3 "Effluent gross" ;
   e1:period "1" ;
   dcterms:date "2007-10-31"^^xsd:date ;
   e1:test_type <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement...-echo-measurements/typed/test/Q1> ;
   rdf:value "5794"^^xsd:decimal ;
   ov:csvRow "2"^^xsd:integer ;
   ov:csvCol "53"^^xsd:integer ;
   e1:unit "GPD" ;
   e1:lsense "<=" ;
   e1:lval "11220" ;
   e1:lunit "GPD" ;
   e1:ltype "avg" .

TODO: no triples should be produced for :thing_2_20 :thing_2_31 :thing_2_42 :thing_2_64 (but how to specify that?).


Added:

      conversion:enhance [
         ov:csvCol          21;
         ov:csvHeader       "C1_UNIT";
         conversion:equivalent_property muo:measuredIn;
         conversion:range   rdfs:Resource;
         conversion:range_name   "Unit";
      ];

(and similar for all other *_UNIT)

Got:

@prefix value_of_unit: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/value-of/unit/> .

:thing_2_53
   rdf:value "5794"^^xsd:decimal ;
   muo:measuredIn typed_unit:GPD .

Added:

      conversion:enhance [
         ov:csvCol          2;
         ov:csvHeader       "DB";
         conversion:predicate rdfs:seeAlso;
         conversion:object    <http://www.epa-echo.gov/echo/compliance_report_water.html>;
      ];

Got:

typed_database:ICIS-NPDES 
   dcterms:identifier "ICIS-NPDES" ;
   a local_vocab:Database ;
   rdfs:label "ICIS-NPDES" ;
   rdfs:seeAlso <http://www.epa-echo.gov/echo/compliance_report_water.html> .

Added:

<http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/version/2011-Apr-04/conversion/enhancement/1>
   a conversion:LayerDataset, void:Dataset;

   rdfs:seeAlso  <http://www.epa-echo.gov/echo/effluents_help.html>;
   foaf:homepage <http://www.epa-echo.gov/echo/>;

Got:

<http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/version/2011-Apr-04/conversion/enhancement/1> 
   a epa-gov_vocab:Dataset , conversion:Dataset , conversion:LayerDataset , void:Dataset ;
   dcterms:modified "2011-04-05T11:17:44.343-05:00"^^xsd:dateTime ;
   rdfs:seeAlso <http://www.epa-echo.gov/echo/effluents_help.html> ;
   foaf:homepage <http://www.epa-echo.gov/echo/> ;

Step D2.9: Enhance measurements at facilities

Table 1: Contents of Effluent Data Download Records at http://www.epa-echo.gov/echo/effluents_help.html (bottom)

TODO

Clone this wiki locally