Skip to content

Use case: Automated suggestion for linkable datasets via fiscal year

Timothy Lebo edited this page Feb 14, 2012 · 87 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Use Case Name: Automated suggestion for linkable datasets - via fiscal year

Point of Contact Name: Timothy Lebo

Use Case Name

Automated suggestion for linkable datasets - via fiscal year

Goal

The principle actor explores a dataset listing, finds interest in a financial dataset, and receives a recommendation to incorporate two other datasets for subsequent analysis.

Summary

The principle actor explores a dataset listing curated by Tetherless World Constellation's (TWC) Linking Open Government Data (LOGD) group. The listing shows datasets from a variety of sources and offers references to the original data sources as well as RDF-encoded conversions that were created by the LOGD group. One advantage of the RDF-encoded conversions is the addition of explicit connections that do not exist among the original disparate datasets. These explicit connections are the result of interpretations codified by human curation that parameterize a conversion tool.

Using this listing, the principle actor finds interest in the R&D budget data from NITRD and receives a recommendation to consider incorporating UK foreign aid data from DFID and US foreign aid data from USAID for subsequent analysis.

Actors

  • Originating data source providers - An organization or person responsible for the creation of data.
  • Originating data source publishers - An organization or person responsible for providing the original data to others. Is often the Originating data source provider or an agent entrusted by the same.
  • Data re-publishers - An organization or person providing the original data (or a modified version) to others.
    • The organization responsible for the different site that provides USAID's data.
    • The TWC Linking Open Government Data group hosts links to the original data and RDF conversions of all three datasets (NITRD, USAID, DFID).
  • Data finders - An organization or person responsible for informing a Data re-publisher or Data catalog provider of the existence of a particular dataset.
    • Jim Hendler informed LOGD of NITRD's dataset.
    • USAID informs their second-party host of their data.
    • James Michaelis informed LOGD of the datasets from USAID and DFID.
  • Data transformers - An organization or person responsible for transforming the original published data into alternative formats (with relatively little increase in structural information).
  • Data enhancers - An organization or person responsible for transforming the original published data into alternative formats that significantly increases the structural information.
    • Tim Lebo wrote enhancement parameters for all three datasets (NITRD, USAID, DFID).
  • Data catalog providers - An organization or person providing others' discovery of the original published data.
    • data.gov provides a page listing the USAID's data, referring to USAID's site to obtain the actual data. James Michaelis used this to find the USAID data.
    • The principle actor used the LOGD site to find all three datasets (NITRD, USAID, DFID).

Preconditions

The following states of the system must be met for the trigger (below) to initiate the use case.

  • The datasets must have been available from the Originating data source providers (NITRD, USAID, DFID).
  • The Data finders (Jim Hendler, USAID, James Michaelis) must have informed the Data re-publishers (second-party USAID site hosts, LOGD group) and Data catalog providers (data.gov, LOGD group) of the existence of the datasets.
  • The Data enhancers (Tim Lebo) must have specified the appropriate interpretation parameters to the csv2rdf4lod converter.
  • The Data re-publishers (Tim Lebo as part of LOGD) must have published the results of the newly aggregated datasets.
  • The appropriate search apparatuses are in place to lead the principle actor to dereference the URI of the NITRD dataset.

These preconditions are illustrated in the Activity Diagram below.

Triggers

Basic Flow

  • Principle actor requests the URI of the NITRD dataset and is redirected to a page describing it.
  • Principle actor's web browser requests the redirected page.
  • The LOGD web server constructs and invokes a SPARQL query to the LOGD SPARQL endpoint.
  • Principle actor's web browser displays the resulting page with a section suggesting USAID and DFID's datasets, citing them by their URIs.
  • Principle actor requests the URI of the USAID dataset and is redirected to a page describing it.
  • Principle actor requests the URI of the NITRD dataset by pressing the web browser's back button.
  • Principle actor requests the URI of the DFID dataset and is redirected to a page describing it.

Alternative Flow

Post Conditions

  • Principle actor is aware of the related datasets and can decide if they are worth incorporating during subsequent analysis.
  • Principle actor has a conceptual understanding of how the related datasets are connected.

Activity Diagram

The following diagram shows how different agents were informed about a particular dataset, how knowledge of their existence led to an initial transformation and one (or more) enhancements, and how the system has the information required to suggest related datasets to the principle actor. We do not know who informed usaidallnet.gov of USAID's data, and we do not know who informed data.gov about the data hosted there. We do know that James found out about USAID's data via data.gov and informed the LOGD group. He also informed LOGD of the DFID data (which he presumably found via Google, which is not denoted). Jim Hendler informed LOGD about the NITRD data. Li Ding, Gino Gervasio, and Tim Lebo transformed the corresponding datasets to RDF using verbatim interpretations. Tim Lebo enhanced USAID and NITRD's data, while helping Gino enhance the DFID data. All resulting data was re-published by the LOGD infrastructure, available as dump files, linked data, and SPARQL endpoint. The LOGD Drupal site is populated by queries to the SPARQL endpoint.

diagram showing flow from original data providers to principle actor

Notes

Data re-publisher subclassOf Data catalog provider .
Data enhancer subClassOf Data transformer .

Resources

This section lists the set of resources (data, services, systems that offer them) required to support the capabilities described in this use case.

diagram showing flow from original data providers to principle actor

Follow-up queries

prefix base_vocab: <http://logd.tw.rpi.edu/vocab/>
prefix muo:        <http://purl.oclc.org/NET/muo/muo#>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix owl:        <http://www.w3.org/2002/07/owl#>
prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?provider ?value ?units ?purpose
where {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2011-Jan-12> {
    ?foreignAid 
       a base_vocab:Transaction ;
       muo:measuredIn       ?units ;
       base_vocab:provider  ?provider;
       base_vocab:recipient [ owl:sameAs <http://dbpedia.org/resource/Afghanistan> ] ;
       base_vocab:purpose   ?purpose ;
       dcterms:temporal <http://logd.tw.rpi.edu/instance-hub/financial/fiscal-year/united-states/FY_2002> ;
       rdf:value ?value
  }
}
prefix base_vocab: <http://logd.tw.rpi.edu/vocab/>
prefix muo:        <http://purl.oclc.org/NET/muo/muo#>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix owl:        <http://www.w3.org/2002/07/owl#>
prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select *
where {
  graph <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/statistics-on-international-development-2009/version/2009-Nov-10> {
    ?foreignAid 
       a base_vocab:Transaction ;
       muo:measuredIn       ?units;
       base_vocab:provider  ?provider;
       base_vocab:recipient [ owl:sameAs ?recipient ] ;
       base_vocab:purpose   ?purpose;
       dcterms:temporal     ?fy ;
       rdf:value            ?value
  }
}

What recipients got money from both US and UK results?

prefix base_vocab: <http://logd.tw.rpi.edu/vocab/>
prefix muo:        <http://purl.oclc.org/NET/muo/muo#>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix owl:        <http://www.w3.org/2002/07/owl#>
prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select distinct ?recipient
where {
  graph <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/statistics-on-international-development-2009/version/2009-Nov-10> {
    ?foreignAid 
       a base_vocab:Transaction ;
       muo:measuredIn       ?units;
       base_vocab:provider  ?provider;
       base_vocab:recipient [ owl:sameAs ?recipient ] ;
       base_vocab:purpose   ?purpose;
       dcterms:temporal     ?fy ;
       rdf:value            ?value
  }
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2011-Jan-12> {
    ?foreignAid2
       a base_vocab:Transaction ;
       #muo:measuredIn       ?units ;
       #base_vocab:provider  ?provider;
       base_vocab:recipient [ owl:sameAs ?recipient ] 
       #base_vocab:purpose   ?purpose ;
       #dcterms:temporal <http://logd.tw.rpi.edu/instance-hub/financial/fiscal-year/united-states/FY_2002> ;
       #rdf:value ?value
  }
}
Clone this wiki locally