Skip to content

Example: vivohack11

Tim L edited this page Jun 16, 2014 · 44 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Background

From May 4th to 7th 2011, a bunch of biomedical/semweb folks met up for a hackathon at University of Florida, graciously hosted by the VIVO project (local notes). In addition to meeting a bunch of great people, I had the privilege and pleasure of working with Chintan Tank and Nick Benik to extend VIVO's current coauthorship visualizations in some pretty cool ways. Our very raw notes from during the hackathon are on titanpad.

UF's article

Overview

The first two steps started independently:

Step 2) Next, we got to show off the power of RDF and linked data

  • Grab the "data cube" behind the visualization (a user right-click copies the link when interested in extending the visualization shown)
  • Give to a third-party app to
    • determine URIs "behind the tally" of a histogram,
    • fetch owl:sameAs URIs from sameas.org,
    • resolve the URIs using content negotiation to augment University of Florida's data with data from WUSTL, Harvard, and bio2rdf.org.
    • recompute the visualization using the augmented data set.

Thanks to Hugh Glaser and Ian Millard for helping set up http://sameas.org/store/vivo/ quickly enough for us to demo at the end of the hackathon.

Step 1A: Exposing Data Cubes

Chintan modified some of VIVO's web site code to encode the visualization's coauthorship calculations as Data Cube and added a link next to the visualization in the web page.

TODO: Chintan describe where these are available and provide some examples. He emailed a zip just after the hackathon, which we sent to cygri.

Step 1B: Making a big owl:sameAs graph

Tim used this use case as an example to extend an existing dataset in TWC's Linked Open Biomedical Data. He ended up with about 300,000 sameAs triples among entities named with URIs from bio2rdf.org, Harvard Profiles, WUSTL's VIVO, UF's VIVO, and RPI's LOBD. The map and diagram below illustrate the connections resulting from converting three different datasets:

Some results from Nick:

Article Counts
-------------------------
  2,200 Cornell
  8,468 U of Florida [crawled]
     93 U of Indiana 
 33,620 Washington U of Med [crawled]
    118 Ponce
      0 Cornell Medical
     97 Scripps
284,000 Harvard Medical [crawled] 

Of the 3 sites crawled, 1,446 Articles had PubMedIDs at more than 1 site.
Example:   
 [site1:pmid=12345] === [site2:pmid=12345]


Of those 1,446 Articles occurring at 2 sites or more, 74 Articles had Authorship
links connected to Article's record at least 2 of the sites.
Example:  
  [site1:authorInAuthorship]<-->[site1:pmid=12345] === [site2:pmid=12345]<-->[site2:authorInAuthorship

Step 2: Augmenting data behind a visualization

Relate work

Clone this wiki locally