Comprehensive Resource of Biomedical Relations with Deep Learning and Network Representations (CROssBAR) - Biomedical Networks
The aim of the CROssBAR project is to develop a large-scale open access system to annotate complex relations between drugs/compounds, target biomolecules, pathways and diseases, via biological data integration and artificial learning based relation prediction.
Sub-projects under CROssBAR:
1) Construction of the CROssBAR database by integrating biological data from various resources
2) Large-scale prediction of unknown drug-target interactions (as well as non-interactions), by developing and applying a deep learning based ML method
3) Generation of the biomedical networks, where nodes will represent compounds/drugs, genes/proteins, pathways and diseases, and the edges will represent the known and predicted pairwise relations
4) Biological evaluation (experimental validation) of the selected results on PI3K/AKT/mTOR pathway, in terms of liver cancer mechanisms
5) Construction of an open access web-service, where it will be possible to browse with an entity of interest to observe the related network with its components
Repository for item # 2: https://github.com/cansyl/DEEPScreen
We constructed a prototype network using CROssBAR integrated data resources and by setting multiple enrichment based filters to include only the most relevant biomedical entities. Later, this workflow will be automatized to generate similar networks and visualize them on the fly using CytoScape browser plug-in, through the CROssBAR web-service.
Below network will be displayed to the web-service user following a web-service search with the term: “hepatocellular carcinoma”
Workflow for the construction of the network:
The prototype network model was created in 7 main steps:
1. The selection of HCC related genes:
-
KEGG (H00048): 20 genes
-
OMIM (Phenotype MIM 114550): 9 genes
-
OpenTargets (EFO_0000182): 18 genes (with score > 0.2 «genetic associations» )
-
TCGA_HCC: 34 genes (expert knowledge)
-
61 HCC related genes in total
2. The determination of protein-protein interactions (PPIs):
-
STRING application on CytoScape
-
PPIs with a confidence score >= 0.95
-
45 PPIs between 31 proteins
3. The selection of compounds interacting with HCC related genes:
3a. Known interactions from DrugBank
-
63 interactions between 21 genes and 57 compounds
-
Edge color: Green
-
Node color: Red (approved and investigational drugs)
3b. Experimentally measured interactions from PubChem + ChEMBL (ExCAPE dataset)
-
Compounds with pXC50 >= 5.0 were labelled as active.
-
For each compound, enrichment score was calculated with hypergeometric test, based on ratios of active & inactive datapoints of compounds for HCC network genes and in the overall ExCAPE dataset (ChEMBL+PubChem) targets.
-
Only compounds with enrichment score > 1 were considered
-
Top 5 compounds, which are not similar to each other, were selected based on enrichment scores
-
26 interactions between 11 genes and 12 compounds
-
Edge color: Blue
-
Node color: Orange (drug-like compounds)
3c. Predicted interactions from DEEPScreen
-
Predicted interactions were retrieved from DEEPSreen predictions
-
For each compound, enrichment score was calculated with hypergeometric test, based on ratios of active & inactive datapoints of compounds for HCC network genes and in the overall DEEPScreen targets.
-
Only compounds with enrichment score > 1 were considered
-
Top 5 compounds, which are not similar to each other, were selected based on enrichment scores
-
25 interactions between 5 genes and 23 compounds
-
Edge color: Red
-
Node color: Orange (if not a drug)
4. The determination of HCC related pathways and their gene associations:
-
Signaling pathways associated with HCC disease pathway (hsa05225) in KEGG
-
STRING enrichment application on CytoScape
- FDR cutoff = 0.05
- KEGG signaling pathways >= 5 enriched genes
- 66 interactions between 22 genes and 10 pathways
5. The determination of other diseases associated with HCC related genes:
- Associations between these genes and other diseases
5a. KEGG Disease Terms
- STRING enrichment application on CytoScape
- FDR cutoff = 0.05
- KEGG diseases >= 10 enriched genes
- 72 interactions between 27 genes and 5 diseases
5b. EFO Disease Terms
-
EFO disease terms were retrieved from GWAS (Genome-Wide Association Studies) Catalog (https://www.ebi.ac.uk/gwas/docs/file-downloads).
-
For each EFO term, enrichment score and p-value was calculated based on ratios of EFO terms in HCC genes and in the overall GWAS gene set.
-
Only EFO terms with enrichment score > 20 and p-value < 0.005 were considered.
-
EFO terms belonging to "disease" root were selected and associated with related genes.
-
35 interactions between 20 genes and 7 EFO disease terms
6. The determination of associations between pathways and diseases:
-
Retrieved from KEGG pathways of the network diseases
-
26 interactions between 10 pathways and 5 diseases
7. The determination of associations between genes and HPO terms:
-
HPO terms were retrieved from Human Phenotype Ontology database (https://hpo.jax.org/app/)
-
For each HPO term, enrichment score and p-value were calculated with hypergeometric test, based on ratios of active & inactive datapoints of HPO terms for HCC network genes and in the overall HPO targets.
-
Only HPO terms with enrichment score > 65 and p-value < 10^-5 were considered
-
Top 10 HPO terms, which have not parent-child relationship with each other, were selected and associated with related genes
-
120 interactions between 22 genes and 10 HPO terms
The finalized prototype network includes 185 nodes (i.e., genes, compounds, pathways, KEGG and EFO diseases, HPO terms) and 478 edges (i.e., interactions) in total.
To load the Hepatocellular Carcinoma Prototype Network on CytoScape;
-
You may directly open the session file (Hepatocellular_Carcinoma_Network.cys) via CytoScape application or (if it does not work):
-
You may open a new session on CytoScape and import the network file (Hepatocellular_Carcinoma_Network.xgmml) as File -> Import -> Network -> File