-
Notifications
You must be signed in to change notification settings - Fork 19
Phenotype matching
Ben Hutton edited this page Mar 11, 2015
·
3 revisions
This page provides a wonderful summary of various semantic similarity scores.
The easiest measure to implement that provides decent performance (on our test data) is probably actually the UI
score.
Given two bags of HPO terms, p
and q
, the UI
score is defined as:
- let
I(t)
for a set of termst
, be the set of terms int
and all the ancestors of the terms int
UI(p, q) = Size{Intersection{I(p), I(q)}} / Size{Union{I(p), I(q)}}
An interactive demo of this working can be found at http://www.planetcalc.com/1664/
The simGIC score appears to perform better in practice, but involves weighting each node by the information content of the node, estimated from a corpus. We did this using the HPO mappings from OMIM->HPO. See this page for a description of the scoring.