Skip to content

Shuhua-Group/RNA-Seq

Repository files navigation

Expression profiles of east-west highly differentiated genes in Uyghur genomes

Introduction

Here is the pipeline describing the data processing in the manuscript for "Expression profiles of east-west highly differentiated genes in Uyghur genomes". The pipeline is applied to quantify expression data and conduct downstream enrichment analysis from fastq by samples.

RNA-seq data processing

1. Quality assessment

fastqc ${fq_dir}/${sampleID}_R1.fastq.gz ${fq_dir}/${sampleID}_R2.fastq.gz --outdir $wd/Quality_Assessment/${sampleID}

2. Trim

mkdir -p ${fq_dir}/trim
cd ${fq_dir}/trim
trim_galore -q 20 --trim1 --paired --fastqc ${fq_dir}/${sampleID}_R1.fastq.gz ${fq_dir}/${sampleID}_R2.fastq.gz

3. Mapping with STAR

Building the STAR index

fasta="$wd/ref/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa"
gtf="$wd/ref/Homo_sapiens.GRCh37.75.gtf"
mkdir $wd/ref/$STARgenomeDir

STAR --runThreadN 5 --runMode genomeGenerate --genomeDir $STARgenomeDir --genomeFastaFiles $fasta --sjdbGTFfile $gtf --limitGenomeGenerateRAM=50503721344 --limitSjdbInsertNsj 3000000 --limitIObufferSize 6368709120 --sjdbOverhang 99 --outFileNamePrefix $wd/$STARgenomeDir

Mapping

mkdir -p $wd/mapping

sh star_mapping.sh ${fq_dir}/trim/${sampleID}_R1.fastq.gz ${fq_dir}/trim/${sampleID}_R2.fastq.gz $wd/ref/$STARgenomeDir  ${num_thread} $wd/mapping

4. Quantify expression level with RSEM

Building the RSEM index

rsem-prepare-reference $wd/ref/ref_ensemble/ $wd/ref/ref_RSEM/Homo_sapiens.GRCh37.75 --gtf $gtf -p 5

RSEM running

sh RSEM.sh ${sampleID} $wd/mapping $wd/ref/ref_RSEM/Homo_sapiens.GRCh37.75 ${num_thread} $wd/RSEM/

5. Merging and QC of the FPKM matrix

Merging the RSEM results to FPKM matrix

python expr_merge.py

FPKM QC

Rscript exp_distribu.r
python expr_detect.py
Rscript exp_distribu_filter.r

PCA analysis of FPKM matrix

Rscript exp_pca.r

Normalization of FPKM matrix with PEER

Rscript exp_peer.r

Functional analysis

1. ASE related analysis

Calling of ASE and aseQTL were applied with the Python package "ASEkit" developed by our team.

2. QTL analysis

The format of input data is consistant with examples in R package "MatrixeQTL"

Rscript MatrixEQTL.r $geno $exp $cov $gene_loc $snploc $cis_res $trans_res

3. Roadmap enrichment

The input are two-columns, TAB-separated, BED-format files

python2 roadmap.enrich.py --cis ${cis_loci} --bg ${bg_loci} --out ${prefix}

4. Location enrichment

The input are two-columns, TAB-separated, BED-format file

python2 loc.enrich.py --cis ${cis_loci} --out ${prefix}

5. GWAS enrichment

The first column of the input file are the Ensembl ID of candidate genes

python GWAS.enrich.py ${gene_file}

6. Gene pathway enrichment

The first column of the input file are the Ensembl ID of candidate/background genes

python gene_func.enrich.py ${candidate_genes} ${background_genes} ${out}

Contribution

Zhilin Ning [email protected]

Xinjiang Tan [email protected]

Yuan Yuan [email protected]

Ke Huang [email protected]

Yuwen Pan [email protected]

Lei Tian [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published