This folder contains all the folders and files necessary to run the unsupervised and supervised analysis described in the paper Integration of machine learning methods to dissect genetically imputed transcriptomic profiles in Alzheimer's Disease.
We used conda to create a custom environment for our analysis:
-
paper-env.yml
: This file contains the anaconda environment specification used to run the unsupervised analysis. If you use anaconda, you just need to run the following command in the terminal to have the same environment as us:$ conda env create --name paper-env --file paper-env.yml
The code used to define the VAE's architecture is from Tybalt.
The files and folders are:
-
01_vae_all_samples_together.py
,02_svm_on_vaes.py
,03_extracting_results_from_svms.py
,04_convert_gene_names.py
: these are the python scripts used for the analysis described in the paper. The number in the beginning of each name represents the order in which they should be run. Each python script contains a very brief description in the beginning about what each one is doing. -
saliency_unsupervised.py
: saliency map implementation for the unsupervised analysis -
multiple_vaes/
: given that no random seed was used, the results on the paper can not be directly reproduced. In order to solve that, this folder contains all the fitted VAE models and corresponding files generated by running the python scripts. Such files include also the important genes for each tissue (in subfolder results/)
We used keras to define our supervised recurrent neural network.
The files and folders are:
-
train_ADNI.py
: this script takes as input the matrices contained in data_adni_1 (be careful, you need to fix the directory accordingly to the cross-tissue or single-tissue analysis you want to perform). It trains the Recurrent Neural Network on a subset of each matrix and it saves the obtained models. -
test_ADNI.py
: this script reloads the models previously saved. It tests the performance of each model on the remaining data of the same tissue (be careful, you need to fix the directory accordingly to the cross-tissue or single-tissue analysis you want to perform). -
saliency_supervised.py
: saliency map implementation for the supervised analysis -
build_cross_tissue_file.py
: this script is used to construct the matrices as they are used for the cross-tissue analysis.
The files and folders are:
-
data_adni_1/
: this folder contains the dataset used for the analysis. -
ADNI0_cc_status.txt
: this file contains which class (Control/0 or AD/1) each sample in the dataset belongs to. -
ENSGid_to_gene_name.tsv
: this is the mapping used to map the selected genes to gene names, as listed in the paper. -
genes_cross_tissue.txt
: the list of genes considered for the cross-tissue analysis.