Expression profiles of east-west highly differentiated genes in Uyghur genomes

Introduction

Here is the pipeline describing the data processing in the manuscript for "Expression profiles of east-west highly differentiated genes in Uyghur genomes". The pipeline is applied to quantify expression data and conduct downstream enrichment analysis from fastq by samples.

RNA-seq data processing

1. Quality assessment

fastqc ${fq_dir}/${sampleID}_R1.fastq.gz ${fq_dir}/${sampleID}_R2.fastq.gz --outdir $wd/Quality_Assessment/${sampleID}

2. Trim

mkdir -p ${fq_dir}/trim
cd ${fq_dir}/trim
trim_galore -q 20 --trim1 --paired --fastqc ${fq_dir}/${sampleID}_R1.fastq.gz ${fq_dir}/${sampleID}_R2.fastq.gz

3. Mapping with STAR

Building the STAR index

fasta="$wd/ref/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa"
gtf="$wd/ref/Homo_sapiens.GRCh37.75.gtf"
mkdir $wd/ref/$STARgenomeDir

STAR --runThreadN 5 --runMode genomeGenerate --genomeDir $STARgenomeDir --genomeFastaFiles $fasta --sjdbGTFfile $gtf --limitGenomeGenerateRAM=50503721344 --limitSjdbInsertNsj 3000000 --limitIObufferSize 6368709120 --sjdbOverhang 99 --outFileNamePrefix $wd/$STARgenomeDir

Mapping

mkdir -p $wd/mapping

sh star_mapping.sh ${fq_dir}/trim/${sampleID}_R1.fastq.gz ${fq_dir}/trim/${sampleID}_R2.fastq.gz $wd/ref/$STARgenomeDir  ${num_thread} $wd/mapping

4. Quantify expression level with RSEM

Building the RSEM index

rsem-prepare-reference $wd/ref/ref_ensemble/ $wd/ref/ref_RSEM/Homo_sapiens.GRCh37.75 --gtf $gtf -p 5

RSEM running

sh RSEM.sh ${sampleID} $wd/mapping $wd/ref/ref_RSEM/Homo_sapiens.GRCh37.75 ${num_thread} $wd/RSEM/

5. Merging and QC of the FPKM matrix

Merging the RSEM results to FPKM matrix

python expr_merge.py

FPKM QC

Rscript exp_distribu.r
python expr_detect.py
Rscript exp_distribu_filter.r

PCA analysis of FPKM matrix

Rscript exp_pca.r

Normalization of FPKM matrix with PEER

Rscript exp_peer.r

Functional analysis

1. ASE related analysis

Calling of ASE and aseQTL were applied with the Python package "ASEkit" developed by our team.

2. QTL analysis

The format of input data is consistant with examples in R package "MatrixeQTL"

Rscript MatrixEQTL.r $geno $exp $cov $gene_loc $snploc $cis_res $trans_res

3. Roadmap enrichment

The input are two-columns, TAB-separated, BED-format files

python2 roadmap.enrich.py --cis ${cis_loci} --bg ${bg_loci} --out ${prefix}

4. Location enrichment

The input are two-columns, TAB-separated, BED-format file

python2 loc.enrich.py --cis ${cis_loci} --out ${prefix}

5. GWAS enrichment

The first column of the input file are the Ensembl ID of candidate genes

python GWAS.enrich.py ${gene_file}

6. Gene pathway enrichment

The first column of the input file are the Ensembl ID of candidate/background genes

python gene_func.enrich.py ${candidate_genes} ${background_genes} ${out}

Contribution

Zhilin Ning [email protected]

Xinjiang Tan [email protected]

Yuan Yuan [email protected]

Ke Huang [email protected]

Yuwen Pan [email protected]

Lei Tian [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expression profiles of east-west highly differentiated genes in Uyghur genomes

Introduction

RNA-seq data processing

1. Quality assessment

2. Trim

3. Mapping with STAR

Building the STAR index

Mapping

4. Quantify expression level with RSEM

Building the RSEM index

RSEM running

5. Merging and QC of the FPKM matrix

Merging the RSEM results to FPKM matrix

FPKM QC

PCA analysis of FPKM matrix

Normalization of FPKM matrix with PEER

Functional analysis

1. ASE related analysis

2. QTL analysis

3. Roadmap enrichment

4. Location enrichment

5. GWAS enrichment

6. Gene pathway enrichment

Contribution

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
GWAS.enrich.py		GWAS.enrich.py
MatrixEQTL.r		MatrixEQTL.r
README.md		README.md
RSEM.sh		RSEM.sh
exp_distribu.r		exp_distribu.r
exp_distribu_filter.r		exp_distribu_filter.r
exp_pca.r		exp_pca.r
exp_peer.r		exp_peer.r
expr_detect.py		expr_detect.py
expr_merge.py		expr_merge.py
gene_func.enrich.r		gene_func.enrich.r
loc.enrich.py		loc.enrich.py
roadmap.enrich.py		roadmap.enrich.py
star_mapping.sh		star_mapping.sh

Shuhua-Group/RNA-Seq

Folders and files

Latest commit

History

Repository files navigation

Expression profiles of east-west highly differentiated genes in Uyghur genomes

Introduction

RNA-seq data processing

1. Quality assessment

2. Trim

3. Mapping with STAR

Building the STAR index

Mapping

4. Quantify expression level with RSEM

Building the RSEM index

RSEM running

5. Merging and QC of the FPKM matrix

Merging the RSEM results to FPKM matrix

FPKM QC

PCA analysis of FPKM matrix

Normalization of FPKM matrix with PEER

Functional analysis

1. ASE related analysis

2. QTL analysis

3. Roadmap enrichment

4. Location enrichment

5. GWAS enrichment

6. Gene pathway enrichment

Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages