Skip to content

Latest commit

 

History

History
241 lines (191 loc) · 7.8 KB

README.md

File metadata and controls

241 lines (191 loc) · 7.8 KB

PofHemat

For : ZaroNoh et al., 2020 Manuscript title : Mass spectrometry analysis of mouse hematopoietic stem cells and their progenitors reveals differential expression within and between proteom and transcriptome throughout adult and aged hematopoiesis Last edited : 12/29/2021 [README is still being updated]

For further inquiries, please e-mail the corresponding e-mail available on the e-Life publication.

github.com/jnoh4/PofHemat



Table of Contents


I. Lines x - y : Files & Folders Summary (does not directly allude to figures in the paper) II. Lines x - y : Figures Summary (Explains which .py file contributed to each figure in the paper) III. Lines x - y : .py Summary (Explains which figure in the paper each .py file contributed to and how) IV. Lines x - y : Specs Used (for the software used to run the code in the .py files) V. Lines x - y : VI. Lines x - y :



I. Files & Folders Summary



main folders - the main folders

analysis - contains analyses of data gen_data - contains derivative data (ex. compiled data) imp_data - contains data imported/downloaded; not generated by code raw_data - contains the raw, initial mass spec/mRNA data


sub-folders - folders available in some main folders

adult - utilizing data only from the adult mouse adult_aged - utilizing data from both the adult and aged mice adult_aHSC - utilizing data only from the adult mouse in addition to the HSC data from the aged mice


sub-sub-folders - folders vailable in some sub-folders

distribution - deals with the distribution of protein expression/fold changes of protein expression mRNA - deals with mRNA in addition to protein uniques - contains information on unique proteomic expression of genes for different subsets of cells


.py files - python code used for the paper

(1) 1_raw_gene_prot.py - Get gene names and UniprotIDs that appear in the raw data. (2) 2_gene_prot_mapping.py - Create mappings between UniprotIDs and gene names from the same gene & decide on a single UniprotID and gene name to use. (3) 3_data_compile.py - Compile adult and aged data using info from the (2) .py file. Compilation method varies. See .py file directly for more info. (4) 4_2-D_pca_adult.py - Perform PCA on the compiled data, using only the adult data. (4a) 4a_2-D_pca_adult_aHSC.py - Perform PCA on the compiled ata, using the adult data plus the aged HSC data. (5) 5_1-D_pca_adult.py - Plot the projection of the PCA centroid values for each cell type onto each PC. (5a) 5a_1-D_pca_adult_aHSC.py - Repeat the procedure in (5) .py file, but including the aged HSC data. (7) 7_unique_adult.py - Find unique protein expressions for different combinations of cell types (8) 8_mRNA_adult.py - Determine K = 2 GMM parameters for protein/mRNA and analyze the model using QQ-plot & the relevant pearson correlation. (9) 9_final_graph.py - Using GMM parameters determined in (6) .py file, plot distributions against histogram and indicate cutoff values. (10) 10_final_graph_genes_mRNA.py - Repeat (9) .py file, but using GMM parameters determined in (8) .py file for protein/mRNa analysis. (11) 11_final_graph_mRNA_mu.py - Plot bargraph of means of high variance Gaussians determined in (8) .py file. (12) 12_final_graph_genes_break.py - Repeat similar to (9) .py file, but indicating position of certain genes & with graph breaks for better visualization. (13) 13_final_graph_genes_mRNA_break.py - Repeat similar to (10) .py file, but indicating position of certain genes & with graph breaks for better visualization. (14) 14_final_graph_mRNA_expl.py - Rather than using a histogram/GMM as in (8) .py file, make a boxplot of the protein/mRNA comparisons. (15) 15_final_graph_mRNA_mu_1.py - Plot bargraph of means of low variance Gaussians determined in (8) .py file. (16) 16_final_graph_mRNA_prot.py - Make a boxplot of unnormalized values of log(protein/mRNA) per cell. (17) 17_final_graph_mRNA_expl_all.py - Repeat (14) .py file, but only using genes expressed in mRNA and protein of ALL cell types normalized to 1,000,000 sum. (18) 18_final_graph_mRNA_prot_all.py - Repeat (16) .py file, but only using genes as in (17) .py file. Indicate where certain genes lie along the boxplot. (19) 19_final_graph_mRNA_prot_zoomed.py - Repeat (18) .py file, but exclude genes and draw means and medians. Calculate protein/mRNA spearman correlation per cell type.



II. Figures Summary



MAIN Fig. 1


MAIN Fig. 2


MAIN Fig. 3


MAIN Fig. 4


MAIN Fig. 5


MAIN Fig. 6


MAIN Fig. 7


SUPPLEMENTARY Fig. 1


SUPPLEMENTARY Fig. 2


SUPPLEMENTARY Fig. 3


SUPPLEMENTARY Fig. 4


SUPPLEMENTARY Fig. 5


SUPPLEMENTARY Fig. 6



III. .py Summary



1_raw_gene_prot.py


2_gene_prot_mapping.py


3_data_compile.py


4_2-D_pca_adult.py


4a_2-D_pca_adult_aHSC.py


5_1-D_pca_adult.py


5a_1-D_pca_adult_aHSC.py


6_GMM_adult.py


6a_GMM_adult_aHSC.py


7_unique_adult.py


8_mRNA_adult.py


9_final_graph.py


10_final_graph_genes_mRNA.py


11_final_graph_mRNA_mu.py


12_final_graph_genes_mRNA_break.py


13_final_graph_genes_break.py


14_final_graph_mRNA_expl.py


15_final_graph_mRNA_mu_1.py


16_final_graph_mRNA_prot.py


17_final_graph_mRNA_expl_all.py


18_final_graph_mRNA_prot_all.py


19_final_graph_mRNA_prot_zoomed.py



IV. Specs Used


python 2.7.15 numpy 1.15.3 matplotlib 2.2.3 scipy 1.2.1 scikit-learn 0.20.3