-
Notifications
You must be signed in to change notification settings - Fork 36
Example: vivohack11
From May 4th to 7th 2011, a bunch of biomedical/semweb folks met up for a hackathon at University of Florida, graciously hosted by the VIVO project (local notes). In addition to meeting a bunch of great people, I had the privilege and pleasure of working with Chintan Tank and Nick Benik to extend VIVO's current coauthorship visualizations in some pretty cool ways. Our very raw notes from during the hackathon are on titanpad.
The first two steps started independently:
- Step 1A) Expose the data in the visualizations using DERI's Data Cube Vocabulary
- Step 1B) Generate a big owl:sameAs graph and host at http://sameas.org/store/vivo/
Step 2) Next, we got to show off the power of RDF and linked data
- Grab the "data cube" behind the visualization (a user right-click copies the link when interested in extending the visualization shown)
- Give to a third-party app to
- determine URIs "behind the tally" of a histogram,
- fetch owl:sameAs URIs from sameas.org,
- resolve the URIs using content negotiation to augment University of Florida's data with data from WUSTL, Harvard, and bio2rdf.org.
- recompute the visualization using the augmented data set.
Thanks to Hugh Glaser and Ian Millard for helping set up http://sameas.org/store/vivo/ quickly enough for us to demo at the end of the hackathon.
Chintan modified some of VIVO's web site code to encode the visualization's coauthorship calculations as Data Cube and added a link next to the visualization in the web page.
TODO: Chintan describe where these are available and provide some examples. He emailed a zip just after the hackathon, which we sent to cygri.
Tim used this use case as an example to extend an existing dataset in TWC's Linked Open Biomedical Data. He ended up with about 300,000 sameAs triples among entities named with URIs from bio2rdf.org, Harvard Profiles, WUSTL's VIVO, UF's VIVO, and RPI's LOBD. The map and diagram below illustrate the connections resulting from converting three different datasets:
- Nick's crawl of WUSTL VIVO, Univ Florida's VIVO, and Harvard Profiles.
- Nick walked the links and requested RDF from each page.
-
NCBI's gene2pubmed dataset
- This dataset reports which genes are mentioned by which publications.
-
NBIC's pmid2doi dataset
- This dataset lists the DOI and pubmedID of the same publication (for 87M publications!).
Some results from Nick:
Article Counts
-------------------------
2,200 Cornell
8,468 U of Florida [crawled]
93 U of Indiana
33,620 Washington U of Med [crawled]
118 Ponce
0 Cornell Medical
97 Scripps
284,000 Harvard Medical [crawled]
Of the 3 sites crawled, 1,446 Articles had PubMedIDs at more than 1 site.
Example:
[site1:pmid=12345] === [site2:pmid=12345]
Of those 1,446 Articles occurring at 2 sites or more, 74 Articles had Authorship
links connected to Article's record at least 2 of the sites.
Example:
[site1:authorInAuthorship]<-->[site1:pmid=12345] === [site2:pmid=12345]<-->[site2:authorInAuthorship