Here is the pipeline describing the data processing in the manuscript for "Expression profiles of east-west highly differentiated genes in Uyghur genomes". The pipeline is applied to quantify expression data and conduct downstream enrichment analysis from fastq by samples.
fastqc ${fq_dir}/${sampleID}_R1.fastq.gz ${fq_dir}/${sampleID}_R2.fastq.gz --outdir $wd/Quality_Assessment/${sampleID}
mkdir -p ${fq_dir}/trim
cd ${fq_dir}/trim
trim_galore -q 20 --trim1 --paired --fastqc ${fq_dir}/${sampleID}_R1.fastq.gz ${fq_dir}/${sampleID}_R2.fastq.gz
3. Mapping with STAR
fasta="$wd/ref/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa"
gtf="$wd/ref/Homo_sapiens.GRCh37.75.gtf"
mkdir $wd/ref/$STARgenomeDir
STAR --runThreadN 5 --runMode genomeGenerate --genomeDir $STARgenomeDir --genomeFastaFiles $fasta --sjdbGTFfile $gtf --limitGenomeGenerateRAM=50503721344 --limitSjdbInsertNsj 3000000 --limitIObufferSize 6368709120 --sjdbOverhang 99 --outFileNamePrefix $wd/$STARgenomeDir
mkdir -p $wd/mapping
sh star_mapping.sh ${fq_dir}/trim/${sampleID}_R1.fastq.gz ${fq_dir}/trim/${sampleID}_R2.fastq.gz $wd/ref/$STARgenomeDir ${num_thread} $wd/mapping
4. Quantify expression level with RSEM
rsem-prepare-reference $wd/ref/ref_ensemble/ $wd/ref/ref_RSEM/Homo_sapiens.GRCh37.75 --gtf $gtf -p 5
sh RSEM.sh ${sampleID} $wd/mapping $wd/ref/ref_RSEM/Homo_sapiens.GRCh37.75 ${num_thread} $wd/RSEM/
python expr_merge.py
Rscript exp_distribu.r
python expr_detect.py
Rscript exp_distribu_filter.r
Rscript exp_pca.r
Rscript exp_peer.r
Calling of ASE and aseQTL were applied with the Python package "ASEkit" developed by our team.
The format of input data is consistant with examples in R package "MatrixeQTL"
Rscript MatrixEQTL.r $geno $exp $cov $gene_loc $snploc $cis_res $trans_res
The input are two-columns, TAB-separated, BED-format files
python2 roadmap.enrich.py --cis ${cis_loci} --bg ${bg_loci} --out ${prefix}
The input are two-columns, TAB-separated, BED-format file
python2 loc.enrich.py --cis ${cis_loci} --out ${prefix}
The first column of the input file are the Ensembl ID of candidate genes
python GWAS.enrich.py ${gene_file}
The first column of the input file are the Ensembl ID of candidate/background genes
python gene_func.enrich.py ${candidate_genes} ${background_genes} ${out}
Zhilin Ning [email protected]
Xinjiang Tan [email protected]
Yuan Yuan [email protected]
Ke Huang [email protected]
Yuwen Pan [email protected]
Lei Tian [email protected]