-
Notifications
You must be signed in to change notification settings - Fork 3
data_overview_and_setup
The following details the steps involved in:
- Read quality assessment, quality trimming, and in silico normalization
- Generating a Trinity de novo RNA-Seq assembly
- Evaluating the quality of the Trinity assembly
- Quantifying transcript expression levels
For this course we will be using the data from this paper : “Defining the transcriptomic landscape of Candida glabrata by RNA-Seq”. Linde et al. Nucleic Acids Res. 2015
The data exist on this directory:
% ls ~/shared_ro/rnaseq_data/
GSNO_rep1_1.fastq ph8_rep1_1.fastq wt37_rep1_1.fastq
GSNO_rep1_2.fastq ph8_rep1_2.fastq wt37_rep1_2.fastq
GSNO_rep2_1.fastq ph8_rep2_1.fastq wt37_rep2_1.fastq
GSNO_rep2_2.fastq ph8_rep2_2.fastq wt37_rep2_2.fastq
GSNO_rep3_1.fastq ph8_rep3_1.fastq wt37_rep3_1.fastq
GSNO_rep3_2.fastq ph8_rep3_2.fastq wt37_rep3_2.fast
These are paired-end rna-seq data corresponding to 3 conditions for growing Candida glabrata: Wildtype (wt), alkaline conditions (ph8) and nitrosative challenge (GSNO). 2M reads were sampled from the original data sets (containing >20M PE reads each). Each of the three samples has three biological replicates.
Before we begin any data processing, create a 'workspace' directory in your home area, and change to that directory like so:
% mkdir workspace
% cd workspace
Next, create shortcuts (symbolic links) to these fastq files for the sake of convenience:
% ln -s ~/shared_ro/rnaseq_data/*fastq .
You can verify that the shortcuts were made by listing the files in long-mode:
% ls -l
.
lrwxrwxrwx 1 training training 54 Oct 3 16:13 GSNO_rep1_1.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep1_1.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 GSNO_rep1_2.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep1_2.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 GSNO_rep2_1.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep2_1.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 GSNO_rep2_2.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep2_2.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 GSNO_rep3_1.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep3_1.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 GSNO_rep3_2.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep3_2.fastq
lrwxrwxrwx 1 training training 53 Oct 3 16:13 ph8_rep1_1.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep1_1.fastq
lrwxrwxrwx 1 training training 53 Oct 3 16:13 ph8_rep1_2.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep1_2.fastq
lrwxrwxrwx 1 training training 53 Oct 3 16:13 ph8_rep2_1.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep2_1.fastq
lrwxrwxrwx 1 training training 53 Oct 3 16:13 ph8_rep2_2.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep2_2.fastq
lrwxrwxrwx 1 training training 53 Oct 3 16:13 ph8_rep3_1.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep3_1.fastq
lrwxrwxrwx 1 training training 53 Oct 3 16:13 ph8_rep3_2.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep3_2.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 wt37_rep1_1.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep1_1.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 wt37_rep1_2.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep1_2.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 wt37_rep2_1.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep2_1.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 wt37_rep2_2.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep2_2.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 wt37_rep3_1.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep3_1.fastq
lrwxrwxrwx 1 training training 54 Oct 3 16:13 wt37_rep3_2.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep3_2.fastq
For most of the workshop, we will be using these fastq files.
For convenience, we'll use a pre-set TRINITY_HOME environmental variable that points to the directory where Trinity is installed:
% echo $TRINITY_HOME
.
/usr/local/src/trinityrnaseq-Trinity-v2.6.6