Skip to content
Ben Hutton edited this page Mar 11, 2015 · 3 revisions

Algorithms for phenotype matching

This page provides a wonderful summary of various semantic similarity scores.

UI score

The easiest measure to implement that provides decent performance (on our test data) is probably actually the UI score.

Given two bags of HPO terms, p and q, the UI score is defined as:

  • let I(t) for a set of terms t, be the set of terms in t and all the ancestors of the terms in t
  • UI(p, q) = Size{Intersection{I(p), I(q)}} / Size{Union{I(p), I(q)}}

An interactive demo of this working can be found at http://www.planetcalc.com/1664/

simGIC score

The simGIC score appears to perform better in practice, but involves weighting each node by the information content of the node, estimated from a corpus. We did this using the HPO mappings from OMIM->HPO. See this page for a description of the scoring.