Skip to content

data_overview_and_setup

Brian Haas edited this page Jun 7, 2018 · 2 revisions

NGS read quality assessment, Transcriptome assembly, and reads mapping

The following details the steps involved in:

  • Read quality assessment, quality trimming, and in silico normalization
  • Generating a Trinity de novo RNA-Seq assembly
  • Evaluating the quality of the Trinity assembly
  • Quantifying transcript expression levels

Data Overview

For this course we will be using the data from this paper : “Defining the transcriptomic landscape of Candida glabrata by RNA-Seq”. Linde et al. Nucleic Acids Res. 2015

The data exist on this directory:

% ls ~/shared_ro/rnaseq_data/

GSNO_rep1_1.fastq  ph8_rep1_1.fastq  wt37_rep1_1.fastq
GSNO_rep1_2.fastq  ph8_rep1_2.fastq  wt37_rep1_2.fastq
GSNO_rep2_1.fastq  ph8_rep2_1.fastq  wt37_rep2_1.fastq
GSNO_rep2_2.fastq  ph8_rep2_2.fastq  wt37_rep2_2.fastq
GSNO_rep3_1.fastq  ph8_rep3_1.fastq  wt37_rep3_1.fastq
GSNO_rep3_2.fastq  ph8_rep3_2.fastq  wt37_rep3_2.fast

These are paired-end rna-seq data corresponding to 3 conditions for growing Candida glabrata: Wildtype (wt), alkaline conditions (ph8) and nitrosative challenge (GSNO). 2M reads were sampled from the original data sets (containing >20M PE reads each). Each of the three samples has three biological replicates.

Setting up your workspace

Before we begin any data processing, create a 'workspace' directory in your home area, and change to that directory like so:

% mkdir workspace
% cd workspace

Next, create shortcuts (symbolic links) to these fastq files for the sake of convenience:

% ln -s ~/shared_ro/rnaseq_data/*fastq .

You can verify that the shortcuts were made by listing the files in long-mode:

% ls -l 

.

lrwxrwxrwx 1 training training 54 Oct  3 16:13 GSNO_rep1_1.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep1_1.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 GSNO_rep1_2.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep1_2.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 GSNO_rep2_1.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep2_1.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 GSNO_rep2_2.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep2_2.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 GSNO_rep3_1.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep3_1.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 GSNO_rep3_2.fastq -> /home/training/shared_ro/rnaseq_data/GSNO_rep3_2.fastq
lrwxrwxrwx 1 training training 53 Oct  3 16:13 ph8_rep1_1.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep1_1.fastq
lrwxrwxrwx 1 training training 53 Oct  3 16:13 ph8_rep1_2.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep1_2.fastq
lrwxrwxrwx 1 training training 53 Oct  3 16:13 ph8_rep2_1.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep2_1.fastq 
lrwxrwxrwx 1 training training 53 Oct  3 16:13 ph8_rep2_2.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep2_2.fastq
lrwxrwxrwx 1 training training 53 Oct  3 16:13 ph8_rep3_1.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep3_1.fastq
lrwxrwxrwx 1 training training 53 Oct  3 16:13 ph8_rep3_2.fastq -> /home/training/shared_ro/rnaseq_data/ph8_rep3_2.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 wt37_rep1_1.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep1_1.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 wt37_rep1_2.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep1_2.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 wt37_rep2_1.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep2_1.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 wt37_rep2_2.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep2_2.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 wt37_rep3_1.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep3_1.fastq
lrwxrwxrwx 1 training training 54 Oct  3 16:13 wt37_rep3_2.fastq -> /home/training/shared_ro/rnaseq_data/wt37_rep3_2.fastq

For most of the workshop, we will be using these fastq files.

Environment Setup

For convenience, we'll use a pre-set TRINITY_HOME environmental variable that points to the directory where Trinity is installed:

% echo $TRINITY_HOME

.

/usr/local/src/trinityrnaseq-Trinity-v2.6.6