Example: vivohack11

csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Background

From May 4th to 7th 2011, a bunch of biomedical/semweb folks met up for a hackathon at University of Florida, graciously hosted by the VIVO project (local notes). In addition to meeting a bunch of great people, I had the privilege and pleasure of working with Chintan Tank and Nick Benik to extend VIVO's current coauthorship visualizations in some pretty cool ways. Our very raw notes from during the hackathon are on titanpad.

UF's article

Overview

The first two steps started independently:

Step 1A) Expose the data in the visualizations using DERI's Data Cube Vocabulary
Step 1B) Generate a big owl:sameAs graph and host at http://sameas.org/store/vivo/

Step 2) Next, we got to show off the power of RDF and linked data

Grab the "data cube" behind the visualization (a user right-click copies the link when interested in extending the visualization shown)
Give to a third-party app to
- determine URIs "behind the tally" of a histogram,
- fetch owl:sameAs URIs from sameas.org,
- resolve the URIs using content negotiation to augment University of Florida's data with data from WUSTL, Harvard, and bio2rdf.org.
- recompute the visualization using the augmented data set.

Thanks to Hugh Glaser and Ian Millard for helping set up http://sameas.org/store/vivo/ quickly enough for us to demo at the end of the hackathon.

Step 1A: Exposing Data Cubes

Chintan modified some of VIVO's web site code to encode the visualization's coauthorship calculations as Data Cube and added a link next to the visualization in the web page.

TODO: Chintan describe where these are available and provide some examples. He emailed a zip just after the hackathon, which we sent to cygri.

Step 1B: Making a big owl:sameAs graph

Tim used this use case as an example to extend an existing dataset in TWC's Linked Open Biomedical Data. He ended up with about 300,000 sameAs triples among entities named with URIs from bio2rdf.org, Harvard Profiles, WUSTL's VIVO, UF's VIVO, and RPI's LOBD. The map and diagram below illustrate the connections resulting from converting three different datasets:

Nick's crawl of WUSTL VIVO, Univ Florida's VIVO, and Harvard Profiles.
- Nick walked the links and requested RDF from each page.
NCBI's gene2pubmed dataset
- This dataset reports which genes are mentioned by which publications.
NBIC's pmid2doi dataset
- This dataset lists the DOI and pubmedID of the same publication (for 87M publications!).

Some results from Nick:

Article Counts
-------------------------
  2,200 Cornell
  8,468 U of Florida [crawled]
     93 U of Indiana 
 33,620 Washington U of Med [crawled]
    118 Ponce
      0 Cornell Medical
     97 Scripps
284,000 Harvard Medical [crawled] 

Of the 3 sites crawled, 1,446 Articles had PubMedIDs at more than 1 site.
Example:   
 [site1:pmid=12345] === [site2:pmid=12345]


Of those 1,446 Articles occurring at 2 sites or more, 74 Articles had Authorship
links connected to Article's record at least 2 of the sites.
Example:  
  [site1:authorInAuthorship]<-->[site1:pmid=12345] === [site2:pmid=12345]<-->[site2:authorInAuthorship

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example: vivohack11

Background

Overview

Step 1A: Exposing Data Cubes

Step 1B: Making a big owl:sameAs graph

Step 2: Augmenting data behind a visualization

Relate work

Clone this wiki locally