Skip to content

Commit

Permalink
Merge branch 'development'
Browse files Browse the repository at this point in the history
  • Loading branch information
dboceck committed Jul 30, 2021
2 parents a60186f + 558ecc1 commit a79eee4
Show file tree
Hide file tree
Showing 20 changed files with 52 additions and 18 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ If the annotation is not made beforehand make sure that the necessary database r
The HPO resources needed for the prioritization step can be found in the `data` folder. The path to the files is specified in the configuration file make sure that it leads to the correct location.
In the data folder there is also a standalone python script to generate these files (inside the script in the comments you can find the download links to the files that are used to generate the resources). For compatibility reasons the HPO graph resources were generated using networkx v1. Version 2 can still import these resources, but the graph generated with version 2 is not compatible with version 1.

Last update of HPO resources: 30th July, 2021


## Pathogenicity prediction
There are two random forest models that are used in AIdiva to predict the pathogenicity of a given variant. One for SNP variants and the other for inframe InDel variants. The training data of the two models consists of variants from Clinvar combined with additional variants from HGMD that are not present in Clinvar.
Expand Down
Binary file modified data/hpo_resources/gene2hpo.pkl
Binary file not shown.
Binary file modified data/hpo_resources/hgnc2gene.pkl
Binary file not shown.
Binary file modified data/hpo_resources/hpo_graph.pkl
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_ABB-SCORE
@cat Makefile_ABB-SCORE.mk

download:
wget -c https://public_docs.crg.es/sossowski/publication_data/ABB/ABB_SCORE.txt
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_Condel
@cat Makefile_Condel.mk

download:
wget -c https://bbglab.irbbarcelona.org/fannsdb/downloads/fannsdb.tsv.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_Eigen-phred
@cat Makefile_Eigen-phred.mk

download:
wget -c http://web.corral.tacc.utexas.edu/WGSAdownload/resources/Eigen/Eigen_hg19_combined.tab.chr1.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_FATHMM-XF
@cat Makefile_FATHMM-XF.mk

download:
wget -c http://fathmm.biocompute.org.uk/fathmm-xf/fathmm_xf_coding.vcf.gz
Expand Down
30 changes: 30 additions & 0 deletions data/makefiles/Makefile_HPO.mk
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
SHELL:=/bin/bash

help:
@cat Makefile_HPO.mk

download:
wget -c http://purl.obolibrary.org/obo/hp/hpoa/phenotype_to_genes.txt
wget -c http://purl.obolibrary.org/obo/hp.obo
wget -c http://purl.obolibrary.org/obo/hp/hpoa/phenotype_annotation.tab
wget -c http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt
wget -c https://stringdb-static.org/download/protein.links.detailed.v11.0/9606.protein.links.detailed.v11.0.txt.gz
wget -c https://string-db.org/mapping_files/STRING_display_names/human.name_2_string.tsv.gz

convert:
awk -F ' ' '{print $$5}' < phenotype_annotation.tab | sort | uniq -c | awk '{print $$2 " " $$1}' > HPO_counts.txt
python3 generate_HPO_resources.py --hpo_ontology hp.obo --gene_phenotype phenotype_to_genes.txt --gene_hpo gene2hpo.pkl --hpo_edges hpo_edges.pkl --hpo_counts HPO_counts.txt --hpo_graph hpo_graph.pkl --hgnc_symbols hgnc_complete_set.txt --hgnc_gene hgnc2gene.pkl --string_links 9606.protein.links.detailed.v11.0.txt.gz --string_mapping human.name_2_string.tsv.gz --gene_interacting gene2interacting.pkl

rm phenotype_to_genes.txt
rm hp.obo
rm phenotype_annotation.tab
rm hgnc_complete_set.txt
rm 9606.protein.links.detailed.v11.0.txt.gz
rm human.name_2_string.tsv.gz
rm hpo_edges.pkl
rm HPO_counts.txt

mv gene2hpo.pkl ../hpo_resources/
mv gene2interacting.pkl ../hpo_resources/
mv hgnc2gene.pkl ../hpo_resources/
mv hpo_graph.pkl ../hpo_resources/
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_MutationAssessor
@cat Makefile_MutationAssessor.mk

download:
wget -c http://mutationassessor.org/r3/MA_scores_rel3_hg19_full.tar.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_gnomAD_OE
@cat Makefile_gnomAD_OE.mk

download:
wget -c https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_phastCons46mammal
@cat Makefile_phastCons46mammal.mk

download_hg19:
wget -c http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons46way/placentalMammals/chr1.phastCons46way.placental.wigFix.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_phastCons46primate
@cat Makefile_phastCons46primate.mk

download_hg19:
wget -c http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons46way/primates/chr1.phastCons46way.primates.wigFix.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_phastCons46vertebrate
@cat Makefile_phastCons46vertebrate.mk

download_hg19:
wget -c http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons46way/vertebrate/chr1.phastCons46way.wigFix.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_phyloP46mammal
@cat Makefile_phyloP46mammal.mk

download_hg19:
wget -c http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/placentalMammals/chr1.phyloP46way.placental.wigFix.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_phyloP46primate
@cat Makefile_phyloP46primate.mk

download_hg19:
wget -c http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/primates/chr1.phyloP46way.primate.wigFix.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_phyloP46vertebrate
@cat Makefile_phyloP46vertebrate.mk

download_hg19:
wget -c http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/chr1.phyloP46way.wigFix.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_segmentDuplication
@cat Makefile_segmentDuplication.mk

download_hg19:
wget -O hg19.genomicSuperDups.txt.gz ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
help:
@cat Makefile_simpleRepeat
@cat Makefile_simpleRepeat.mk

download_hg19:
wget -O hg19.simpleRepeat.txt.gz ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

# get mapping gene -> HPOs
# download from HPO charite phenotype to gene
# wget http://compbio.charite.de/jenkins/job/hpo.annotations/lastStableBuild/artifact/util/annotation/phenotype_to_genes.txt
# wget http://purl.obolibrary.org/obo/hp/hpoa/phenotype_to_genes.txt
def generate_gene2hpo_dict(gene2phenotype_list, gene2hpo_dict):
print("Generate gene to HPO mapping...")
gene_2_HPO = dict()
Expand All @@ -31,7 +31,7 @@ def generate_gene2hpo_dict(gene2phenotype_list, gene2hpo_dict):


# download data
# wget https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.obo
# wget http://purl.obolibrary.org/obo/hp.obo
def extract_hpo_graph_edges(hpo_ontology, hpo_edges_file):
print("Extract HPO edges...")
out_HPO = dict()
Expand Down Expand Up @@ -87,7 +87,7 @@ def extract_hpo_graph_edges(hpo_ontology, hpo_edges_file):
print("HPO edges successfully extracted and saved as %s" % (hpo_edges_file))


# counts as wget http://compbio.charite.de/jenkins/job/hpo.annotations/lastStableBuild/artifact/misc/phenotype_annotation.tab
# counts as wget http://purl.obolibrary.org/obo/hp/hpoa/phenotype_annotation.tab
# awk -F '\t' '{print $5}' < phenotype_annotation.tab | sort | uniq -c | awk '{print $2 "\t" $1}' > HPO_counts.txt
def generate_hpo_graph(hpo_counts, hpo_edges_file, hpo_graph_file):
print("Generate HPO graph...")
Expand All @@ -98,8 +98,10 @@ def generate_hpo_graph(hpo_counts, hpo_edges_file, hpo_graph_file):
# generate graph with counts:
counts_dict = dict()
tot = 0
with open(hpo_counts) as count_file:
with open(hpo_counts, "r") as count_file:
for line in count_file:
if line.startswith("HPO-ID"):
continue
splitted_line = line.strip().split("\t")
counts_dict[splitted_line[0]] = int(splitted_line[1])
tot += int(splitted_line[1])
Expand Down

0 comments on commit a79eee4

Please sign in to comment.