Skip to content

Latest commit

 

History

History
113 lines (79 loc) · 4.06 KB

File metadata and controls

113 lines (79 loc) · 4.06 KB

LDSC Pipeline

The following code is intended to be run on a cluster using SLURM.

You will need to upload the 'Bed' folders on your cluster (generated by the get_*_input.Rmd files.

Command for uploading the folder containing the .bed files for the Zeisel dataset:

scp -r LDSC/Bed [email protected]:/home/LDSC_Zeisel

Annotation files for partitioned LDSC are also necessary. They can be downloaded here.

Load modules

The following modules have to be loaded on your cluster (or the softwares need to be available in your path):

module load ldsc/1.0.0 
module load bedtools/2.25.0
module load r/3.4.1

Tissue Analysis

Create LDSC annotation files for each cell type based on the .bed files (in the Bed folders) that contain the location of the 10% most specific genes in each cell type (extended coordinates by 100kb up and downstream).

For each .bed file in the folder, the script will generate an LDSC annotation file indicating which SNP are located in the top 10% most specific genes of the cell type (extended coordinates by 100kb up and downstream).

LD scores for this new annotation are then computed.

Note that you will need to do a few steps before you can launch the different scripts:

  1. You need to generate a new folder (1000G_EUR_Phase3_baseline_annot) based on the (1000G_EUR_Phase3_baseline) folder. It's essentially a copy of the folder from LDSC with unzipped annotations only.
ls
baseline.10.annot  baseline.14.annot  baseline.18.annot  baseline.21.annot  baseline.4.annot  baseline.8.annot
baseline.11.annot  baseline.15.annot  baseline.19.annot  baseline.22.annot  baseline.5.annot  baseline.9.annot
baseline.12.annot  baseline.16.annot  baseline.1.annot   baseline.2.annot   baseline.6.annot
baseline.13.annot  baseline.17.annot  baseline.20.annot  baseline.3.annot   baseline.7.annot
  1. From these annot files, you can then create a .bed file with the location of all SNPs used in the baseline model.
for file in baseline*
do
awk 'NR > 1{print "chr"$1"\t"$2"\t"$2"\t"$3}' $file >> tmp.bed
done

You can then sort it using the following command:

sortBed -i tmp.bed > 1000genomes_phase3_SNPs.bed2
rm tmp.bed

The path where the annotations (and newly generated folder and files) are located should be modified in the get_annotation_ldscores_tissue_v2.sh script. See below:

path_name="/nas/depts/007/sullilab/shared/partitioned_LDSC/"
all_snps="1000G_EUR_Phase3_baseline_annot/1000genomes_phase3_SNPs.bed2"
all_annotations="1000G_EUR_Phase3_baseline_annot"
plink_file="1000G_EUR_Phase3_plink"
hapmap_snps="hm_snp.txt"

Note that you need to modify the location of the path and annotations in the get_partitioned_h2_tissue_v2 file as well.

path_name="/nas/depts/007/sullilab/shared/partitioned_LDSC/"

#Downloaded on LDSC wiki
weights="1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC."
frq="1000G_Phase3_frq/1000G.EUR.QC."
all_annotations="1000G_EUR_Phase3_baseline"

Finally, you need to put the fast_match2_minimal.pl script in the folder ($path_name, here "/nas/depts/007/sullilab/shared/partitioned_LDSC/")

Once this is done you can launch the following command to run the script on all .bed files in the Bed folder.

cp get_annotation_ldscores_tissue_v2.sh Bed
cp get_partitioned_h2_tissue_v2.sh Bed
cp get_tissue_pvalue.R Bed
cd Bed
sbatch -t 48:00:00 -n 1 -o log_get_annot_ld_scores_tissue \
--wrap="sh get_annotation_ldscores_tissue_v2.sh"

Heritability enrichment analysis

Once the previous step is finished (all ld scores have been calculated). This step will look for heritability enrichment for the different annotations.

cd Bed
sbatch -t 1:00:00 -n 1 -o log_get_partitioned_finucane_h2clozuk \
--wrap="sh get_partitioned_h2_tissue_v2.sh int.sumstats.gz"

Get Pvalues

Once the previous steps are finished, you can gather all pvalues for the annotations using the following command:

This need to be launched in the folder where all the bed files are located

sbatch -t 1:00:00 -n 1 -o log_get_pvalues --wrap="Rscript get_tissue_pvalue.R"