-
Notifications
You must be signed in to change notification settings - Fork 36
Use case: Automated suggestion for linkable datasets via fiscal year
Use Case Name: Automated suggestion for linkable datasets - via fiscal year
Point of Contact Name: Timothy Lebo
Automated suggestion for linkable datasets - via fiscal year
The principle actor explores a dataset listing, finds interest in a financial dataset, and receives a recommendation to incorporate two other datasets for subsequent analysis.
The principle actor explores a dataset listing curated by Tetherless World Constellation's (TWC) Linking Open Government Data (LOGD) group. The listing shows datasets from a variety of sources and offers references to the original data sources as well as RDF-encoded conversions that were created by the LOGD group. One advantage of the RDF-encoded conversions is the addition of explicit connections that do not exist among the original disparate datasets. These explicit connections are the result of interpretations codified by human curation that parameterize a conversion tool.
Using this listing, the principle actor finds interest in the R&D budget data from NITRD and receives a recommendation to consider incorporating UK foreign aid data from DFID and US foreign aid data from USAID for subsequent analysis.
- Originating data source providers - An organization or person responsible for the creation of data.
-
Originating data source publishers - An organization or person responsible for providing the original data to others. Is often the Originating data source provider or an agent entrusted by the same.
- NITRD offers their data from their own site
- USAID offers their data through a different site still affiliated with USAID.
- DFID offers their data from their own site
-
Data re-publishers - An organization or person providing the original data (or a modified version) to others.
- The organization responsible for the different site that provides USAID's data.
- The TWC Linking Open Government Data group hosts links to the original data and RDF conversions of all three datasets (NITRD, USAID, DFID).
-
Data finders - An organization or person responsible for informing a Data re-publisher or Data catalog provider of the existence of a particular dataset.
- Jim Hendler informed LOGD of NITRD's dataset.
- USAID informs their second-party host of their data.
- James Michaelis informed LOGD of the datasets from USAID and DFID.
-
Data transformers - An organization or person responsible for transforming the original published data into alternative formats (with relatively little increase in structural information).
- Tim Lebo probably did most of this.
-
Data enhancers - An organization or person responsible for transforming the original published data into alternative formats that significantly increases the structural information.
- Tim Lebo wrote enhancement parameters for all three datasets (NITRD, USAID, DFID).
-
Data catalog providers - An organization or person providing others' discovery of the original published data.
- data.gov provides a page listing the USAID's data, referring to USAID's site to obtain the actual data. James Michaelis used this to find the USAID data.
- The principle actor used the LOGD site to find all three datasets (NITRD, USAID, DFID).
The following states of the system must be met for the trigger (below) to initiate the use case.
- The datasets must have been available from the Originating data source providers (NITRD, USAID, DFID).
- The Data finders (Jim Hendler, USAID, James Michaelis) must have informed the Data re-publishers (second-party USAID site hosts, LOGD group) and Data catalog providers (data.gov, LOGD group) of the existence of the datasets.
- The Data enhancers (Tim Lebo) must have specified the appropriate interpretation parameters to the csv2rdf4lod converter.
- The Data re-publishers (Tim Lebo as part of LOGD) must have published the results of the newly aggregated datasets.
- The appropriate search apparatuses are in place to lead the principle actor to dereference the URI of the NITRD dataset.
These preconditions are illustrated in the Activity Diagram below.
-
Trigger 1: Principle actor requests the URI of the NITRD dataset.
-
Trigger 2: Principle actor requests the URI of the notion of Fiscal Year.
-
Trigger 3: Principle actor requests the URI of the property of fiscal_year.
- Principle actor requests the URI of the NITRD dataset and is redirected to a page describing it.
- Principle actor's web browser requests the redirected page.
- The LOGD web server constructs and invokes a SPARQL query to the LOGD SPARQL endpoint.
- Principle actor's web browser displays the resulting page with a section suggesting USAID and DFID's datasets, citing them by their URIs.
- Principle actor requests the URI of the USAID dataset and is redirected to a page describing it.
- Principle actor requests the URI of the NITRD dataset by pressing the web browser's back button.
- Principle actor requests the URI of the DFID dataset and is redirected to a page describing it.
-
URIs for NITRD dataset, USAID dataset, DFID dataset, the notion of Fiscal Year, or the property of fiscal_year are not resolvable. This ends in an error state.
-
Principle actor, instead of requesting the URI of the NITRD dataset, requests the notion of Fiscal Year or the property of fiscal_year. In each of these alternatives, a page is presented describing that entity (or relationship) and provides suggestions for related datasets based on common reference to the entity (or relationship).
- Principle actor is aware of the related datasets and can decide if they are worth incorporating during subsequent analysis.
- Principle actor has a conceptual understanding of how the related datasets are connected.
The following diagram shows how different agents were informed about a particular dataset, how knowledge of their existence led to an initial transformation and one (or more) enhancements, and how the system has the information required to suggest related datasets to the principle actor. We do not know who informed usaidallnet.gov of USAID's data, and we do not know who informed data.gov about the data hosted there. We do know that James found out about USAID's data via data.gov and informed the LOGD group. He also informed LOGD of the DFID data (which he presumably found via Google, which is not denoted). Jim Hendler informed LOGD about the NITRD data. Li Ding, Gino Gervasio, and Tim Lebo transformed the corresponding datasets to RDF using verbatim interpretations. Tim Lebo enhanced USAID and NITRD's data, while helping Gino enhance the DFID data. All resulting data was re-published by the LOGD infrastructure, available as dump files, linked data, and SPARQL endpoint. The LOGD Drupal site is populated by queries to the SPARQL endpoint.
Data re-publisher subclassOf Data catalog provider .
Data enhancer subClassOf Data transformer .
This section lists the set of resources (data, services, systems that offer them) required to support the capabilities described in this use case.
prefix base_vocab: <http://logd.tw.rpi.edu/vocab/>
prefix muo: <http://purl.oclc.org/NET/muo/muo#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?provider ?value ?units ?purpose
where {
graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2011-Jan-12> {
?foreignAid
a base_vocab:Transaction ;
muo:measuredIn ?units ;
base_vocab:provider ?provider;
base_vocab:recipient [ owl:sameAs <http://dbpedia.org/resource/Afghanistan> ] ;
base_vocab:purpose ?purpose ;
dcterms:temporal <http://logd.tw.rpi.edu/instance-hub/financial/fiscal-year/united-states/FY_2002> ;
rdf:value ?value
}
}
prefix base_vocab: <http://logd.tw.rpi.edu/vocab/>
prefix muo: <http://purl.oclc.org/NET/muo/muo#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select *
where {
graph <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/statistics-on-international-development-2009/version/2009-Nov-10> {
?foreignAid
a base_vocab:Transaction ;
muo:measuredIn ?units;
base_vocab:provider ?provider;
base_vocab:recipient [ owl:sameAs ?recipient ] ;
base_vocab:purpose ?purpose;
dcterms:temporal ?fy ;
rdf:value ?value
}
}
What recipients got money from both US and UK results?
prefix base_vocab: <http://logd.tw.rpi.edu/vocab/>
prefix muo: <http://purl.oclc.org/NET/muo/muo#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select distinct ?recipient
where {
graph <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/statistics-on-international-development-2009/version/2009-Nov-10> {
?foreignAid
a base_vocab:Transaction ;
muo:measuredIn ?units;
base_vocab:provider ?provider;
base_vocab:recipient [ owl:sameAs ?recipient ] ;
base_vocab:purpose ?purpose;
dcterms:temporal ?fy ;
rdf:value ?value
}
graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2011-Jan-12> {
?foreignAid2
a base_vocab:Transaction ;
#muo:measuredIn ?units ;
#base_vocab:provider ?provider;
base_vocab:recipient [ owl:sameAs ?recipient ]
#base_vocab:purpose ?purpose ;
#dcterms:temporal <http://logd.tw.rpi.edu/instance-hub/financial/fiscal-year/united-states/FY_2002> ;
#rdf:value ?value
}
}