-
Notifications
You must be signed in to change notification settings - Fork 36
Example: EPA's Enforcement & Compliance History Online (ECHO)
See Examples.
Actual use case involving Rhode Island having sick people in area. Initial prototype Fall 2010 by Evan Patton, Ping Wang, and Jin Zheng. Scoped data to Rhode Island and used custom converter. Developed an ontology. Extended Spring 2011 by Ping Wang, Tim Lebo, and Jin Zheng to include more regions and adoption of csv2rdf4lod. This page provides a good overview of what the ECHO system provides (and the visual presentations they use).
- Steps D1.1 through D1.8 name, retrieve, and perform naive conversion for the ECHO facilities.
- Steps D2.1 through D2.7 name, retrieve, and perform naive conversion for the ECHO measurements at the facilities.
- Steps D1.9 through ?? enhance the naive conversion of ECHO facilities with a cell-based subjects conversion.
- Steps D2.8 through ?? enhance the naive conversion of measurements at ECHO facilities with a cell-based subjects conversion.
(see naming phase)
Given nothing but a web page for the data (http://www.epa-echo.gov/echo/), we need to figure out who is providing it.
epa-gov
(see naming phase)
enforcement-and-compliance-history-online-echo-facilities
Following naming conventions, including acronym expansion as well as acronym. Implicit sub-dataset for different structures (done to apply two global interpretations to their corresponding structures -- artifact of global params application by automation) -- can be encoded explicitly with skos:broader later and when needed.
(see naming phase)
bash-3.2$ mkdir -p source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
bash-3.2$ cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
bash-3.2$ urldate.sh http://www.epa-echo.gov/ideadownloads/2010/ICIS_NPDES.zip
2010-Sep-09
(see retrieving phase)
bash-3.2$ mkdir -p version/2010-Sep-09/source/
bash-3.2$ cd version/2010-Sep-09/source/
bash-3.2$ pcurl.sh http://www.epa-echo.gov/ideadownloads/2010/ICIS_NPDES.zip
Step D1.5: Uncompressing with provenance
bash-3.2$ punzip.sh ICIS_NPDES.zip
---------------------------------- punzip ---------------------------------------
ICP10.TXT came from ICIS_NPDES.zip
ICP01.TXT came from ICIS_NPDES.zip
ICP01A.TXT came from ICIS_NPDES.zip
ICP02.TXT came from ICIS_NPDES.zip
ICP03.TXT came from ICIS_NPDES.zip
ICP04.TXT came from ICIS_NPDES.zip
ICP05.TXT came from ICIS_NPDES.zip
ICP06.TXT came from ICIS_NPDES.zip
ICP07.TXT came from ICIS_NPDES.zip
ICP08.TXT came from ICIS_NPDES.zip
ICP09.TXT came from ICIS_NPDES.zip
--------------------------------------------------------------------------------
Step D1.6: Getting (and staying) in the conversion cockpit
cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/version/2010-Sep-09
Step D1.7: Creating the conversion trigger
(see create trigger phase)
Setting the cell delimiter to pipes (|
).
bash-3.2$ ls
source
bash-3.2$ cr-create-convert-sh.sh -w --delimiter \| source/ICP01.TXT
bash-3.2$ ls -lt
total 8
-rwxr-xr-x 1 lebot staff 2164 Apr 4 15:04 convert-enforcement-and-compliance-history-online-echo-facilities.sh
drwxr-xr-x 26 lebot staff 884 Apr 4 15:04 source
Step D1.8: Pulling the conversion trigger
(see pull trigger phase)
bash-3.2$ ./convert-enforcement-and-compliance-history-online-echo-facilities.sh
(see naming phase)
Same as D1.1; epa-gov
.
(see naming phase)
enforcement-and-compliance-history-online-echo-measurements
bash-3.2$ mkdir -p source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
bash-3.2$ cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/
Following naming conventions, using retrieval date: 2011-Apr-04
.
bash-3.2$ mkdir version/2011-Apr-04/
bash-3.2$ cd 2011-Apr-04
(see retrieving phase and pcurl.sh)
bash-3.2$ cd source/
bash-3.2$ pcurl.sh http://www.epa-echo.gov/cgi-bin/effluentdata.cgi \
-F "permit=NY0261343" -F "hits=1" -n NY0261343 -e csv
Step D2.5: Getting (and staying) in the conversion cockpit
bash-3.2$ cd source/epa-gov/enforcement-and-compliance-history-online-echo-measurements/version/2011-Apr-04/
Step D2.6: Creating the conversion trigger
(see create trigger phase)
bash-3.2$ ls
source
bash-3.2$ cr-create-convert-sh.sh -w source/NY0261343.csv
bash-3.2$ ls -lt
total 8
-rwxr-xr-x 1 lebot staff 2178 Apr 4 23:00 convert-enforcement-and-compliance-history-online-echo-measurements.sh
drwxr-xr-x 4 lebot staff 136 Apr 4 22:58 source
Step D2.7: Pulling the conversion trigger
(see pull trigger phase)
bash-3.2$ ./convert-enforcement-and-compliance-history-online-echo-measurements.sh
Added:
conversion:conversion_process [
conversion:interpret [
conversion:symbol "";
conversion:interpretation conversion:null;
];
(Got rid of triples with empty string object value "").
Added:
conversion:enhance [
ov:csvCol 1;
conversion:range rdfs:Resource;
conversion:range_name "Permit";
Got:
@prefix typed_permit: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/typed/permit/> .
:thing_2
e1:permitid typed_permit:NY0261343 .
typed_permit:NY0261343 dcterms:identifier "NY0261343" ;
a local_vocab:Permit ;
rdfs:label "NY0261343" .
Added:
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "DB";
conversion:subproperty_of dcterms:source;
conversion:range rdfs:Resource;
conversion:range_name "Database";
];
Got:
@prefix typed_database: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/typed/database/> .
:thing_2
e1:db typed_database:ICIS-NPDES .
typed_database:ICIS-NPDES
dcterms:identifier "ICIS-NPDES" ;
a local_vocab:Database ;
rdfs:label "ICIS-NPDES" .
Added:
:address_bundle
a conversion:ImplicitBundle;
conversion:property_name con:address;
conversion:type_name con:Address;
.
conversion:conversion_process [
conversion:enhance [
ov:csvCol 4;
ov:csvHeader "Address";
conversion:bundled_by :address_bundle;
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 5;
ov:csvHeader "City";
conversion:bundled_by :address_bundle;
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 6;
ov:csvHeader "State";
conversion:bundled_by :address_bundle;
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 7;
ov:csvHeader "ZIP";
conversion:bundled_by :address_bundle;
Got:
:thing_2
con:address implicit_address:address_2 ;
implicit_address:address_2 a local_vocab:Address ;
e1:address "3992 NY ROUTE 2" ;
e1:city "TROY" ;
e1:state "NY" ;
e1:zip "12180" .
Added:
@prefix con: <http://www.w3.org/2000/10/swap/pim/contact#> .
conversion:enhance [
ov:csvCol 4;
ov:csvHeader "Address";
conversion:equivalent_property con:street;
];
conversion:enhance [
ov:csvCol 5;
ov:csvHeader "City";
conversion:equivalent_property con:city;
];
conversion:enhance [
ov:csvCol 6;
ov:csvHeader "State";
conversion:equivalent_property con:stateOrProvince;
];
conversion:enhance [
ov:csvCol 7;
ov:csvHeader "ZIP";
conversion:equivalent_property con:zip;
Got:
implicit_address:address_2 a local_vocab:Address ;
con:street "3992 NY ROUTE 2" ;
con:city "TROY" ;
con:stateOrProvince "NY" ;
con:zip "12180" .
Added:
conversion:enhance [
ov:csvCol 6;
ov:csvHeader "State";
conversion:bundled_by :address_bundle;
conversion:equivalent_property con:stateOrProvince;
conversion:links_via <http://www.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl>,
<http://www.rpi.edu/~lebot/lod-links/state-fips-geonames.ttl>,
<http://www.rpi.edu/~lebot/lod-links/state-fips-govtrack.ttl>;
conversion:subject_of dcterms:identifier;
conversion:range rdfs:Resource;
];
Got:
@prefix local_vocab: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/vocab/> .
@prefix govtrackusgov: <http://www.rdfabout.com/rdf/usgov/geo/us/> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
implicit_address:address_2
a local_vocab:Address ;
con:street "3992 NY ROUTE 2" ;
con:city "TROY" ;
con:stateOrProvince typed_state:NY .
typed_state:NY
dcterms:identifier "NY" ;
a local_vocab:State ;
rdfs:label "NY" ;
owl:sameAs <http://sws.geonames.org/5128638/> ,
govtrackusgov:NY ,
dbpedia:New_York .
Added:
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "PermitID";
conversion:label "Checks conformance of";
Got:
:thing_2
e1:checks_conformance_of typed_permit:NY0261343 ;
Added:
conversion:enhance [
ov:csvCol 18;
ov:csvHeader "DATE";
conversion:equivalent_property dcterms:date;
conversion:eg "20071031";
conversion:date_pattern "yyyyMMdd";
conversion:range xsd:date;
];
Got:
:thing_2
dcterms:date "2007-10-31"^^xsd:date ;
Added:
conversion:enhance [
ov:csvCol 20;
ov:csvHeader "C1_VALUE";
a scovo:Item;
conversion:label "Test Type";
conversion:object "[/sd]typed/test/C1";
conversion:comment "";
conversion:range xsd:decimal;
];
conversion:enhance [
ov:csvCol 31;
ov:csvHeader "C2_VALUE";
a scovo:Item;
conversion:label "Test Type";
conversion:object "[/sd]typed/test/C2";
conversion:comment "";
conversion:range xsd:decimal;
];
conversion:enhance [
ov:csvCol 42;
ov:csvHeader "C3_VALUE";
a scovo:Item;
conversion:label "Test Type";
conversion:object "[/sd]typed/test/C3";
conversion:comment "";
conversion:range xsd:decimal;
];
conversion:enhance [
ov:csvCol 53;
ov:csvHeader "Q1_VALUE";
a scovo:Item;
conversion:label "Test Type";
conversion:object "[/sd]typed/test/Q1";
conversion:comment "";
conversion:range xsd:decimal;
];
conversion:enhance [
ov:csvCol 64;
ov:csvHeader "Q2_VALUE";
a scovo:Item;
conversion:label "Test Type";
conversion:object "[/sd]typed/test/Q2";
conversion:comment "";
conversion:range xsd:decimal;
];
Got:
todo
Added:
(the bundled_bys)
Got:
:thing_2_53
e1:checks_conformance_of typed_permit:NY0261343 ;
e1:db typed_database:ICIS-NPDES ;
dcterms:source typed_database:ICIS-NPDES ;
e1:name "BRUNSWICK CENTRAL SCHOOL DIST" ;
con:address implicit_address:address_2 ;
e1:status "Effective" ;
e1:ownership "City government" ;
e1:pipe "001" ;
e1:paramtr "00056" ;
e1:name_2 "Flow rate" ;
e1:monlocn "1" ;
e1:name_3 "Effluent gross" ;
e1:period "1" ;
dcterms:date "2007-10-31"^^xsd:date ;
e1:test_type <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement...-echo-measurements/typed/test/Q1> ;
rdf:value "5794"^^xsd:decimal ;
ov:csvRow "2"^^xsd:integer ;
ov:csvCol "53"^^xsd:integer ;
e1:unit "GPD" ;
e1:lsense "<=" ;
e1:lval "11220" ;
e1:lunit "GPD" ;
e1:ltype "avg" .
TODO: no triples should be produced for :thing_2_20 :thing_2_31 :thing_2_42 :thing_2_64
(but how to specify that?).
Added:
conversion:enhance [
ov:csvCol 21;
ov:csvHeader "C1_UNIT";
conversion:equivalent_property muo:measuredIn;
conversion:range rdfs:Resource;
conversion:range_name "Unit";
];
(and similar for all other *_UNIT)
Got:
@prefix value_of_unit: <http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/value-of/unit/> .
:thing_2_53
rdf:value "5794"^^xsd:decimal ;
muo:measuredIn typed_unit:GPD .
Added:
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "DB";
conversion:predicate rdfs:seeAlso;
conversion:object <http://www.epa-echo.gov/echo/compliance_report_water.html>;
];
Got:
typed_database:ICIS-NPDES
dcterms:identifier "ICIS-NPDES" ;
a local_vocab:Database ;
rdfs:label "ICIS-NPDES" ;
rdfs:seeAlso <http://www.epa-echo.gov/echo/compliance_report_water.html> .
Added:
<http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/version/2011-Apr-04/conversion/enhancement/1>
a conversion:LayerDataset, void:Dataset;
rdfs:seeAlso <http://www.epa-echo.gov/echo/effluents_help.html>;
foaf:homepage <http://www.epa-echo.gov/echo/>;
Got:
<http://logd.tw.rpi.edu/source/epa-gov/dataset/enforcement-and-compliance-history-online-echo-measurements/version/2011-Apr-04/conversion/enhancement/1>
a epa-gov_vocab:Dataset , conversion:Dataset , conversion:LayerDataset , void:Dataset ;
dcterms:modified "2011-04-05T11:17:44.343-05:00"^^xsd:dateTime ;
rdfs:seeAlso <http://www.epa-echo.gov/echo/effluents_help.html> ;
foaf:homepage <http://www.epa-echo.gov/echo/> ;
Table 1: Contents of Effluent Data Download Records at http://www.epa-echo.gov/echo/effluents_help.html (bottom)
TODO