Schedule for the single-cell RNA-seq data analysis workshop (HSPH)

Pre-reading

Introduction to scRNA-seq
Raw data to count matrix
Download this project

Day 1

Time	Topic	Instructor
13:30 - 13:45	Workshop introduction	Radhika
13:45 - 15:00	Single-cell RNA-seq design and methods	Dr. Mandovi Chatterjee
15:00 - 15:05	Break
15:05 - 15:15	scRNA-seq pre-reading discussion	Jihe
15:15 - 15:55	Quality control set-up	Radhika
15:55 - 16:00	Overview of self-learning materials and homework submission	Jihe

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

Quality control

Click here for a preview of this lesson

Before you start any analysis, it’s important to know whether or not you have good quality cells. At these early stages you can flag or remove samples that could produce erroneous results downstream.

In this lesson you will:
- Compute essential QC metrics for each sample
- Create plots to visualize metrics per sample
- Critically evaluate each plot and learn what each QC metric means
Overview of Clustering Workflow

Click here for a preview of this lesson

QC is complete, what's next?

In this lesson you will get a brief overview of the next steps in the scRNA-seq analysis workflow. It's good to have a big picture understanding before we get into the nitty gritty details!
Theory of Normalization and PCA

Click here for a preview of this lesson

Before we can begin the next steps of the workflow, we need to make sure you have a good understanding of two important concepts: normalization and Principal Components Analysis (PCA). These are two methods that will be utilized in the scRNA-seq analysis workflow, and this foundation will help you better navigate those steps.
Normalization and regressing out unwanted variation

Click here for a preview of this lesson

During the analysis we will be making lots of comparisons; between cells, between samples, or both. To make accurate comparisons of gene expression we need to first perform normalization. We also want to make sure that the differences we find are a true biolgical effect and not a result of other sources of unwanted variation .

In this lesson you will:
- Assess your data for any unwanted variation
- Normalize the data while also regressing out any identified sources of unwanted variation

II. Complete the exercises:

Each lesson above contain exercises; please go through each of them.
Copy over your R code from the exercises to this (downloadable) R script
Upload the saved R script file to Dropbox day before the next class.

III. Run the code in this script to perform the steps of integration. We will discuss the code and theory in class.

Questions?

If you get stuck due to an error while runnning code in the lesson, email us
Post any conceptual questions that you would like to have reviewed in class here.

Day 2

Time	Topic	Instructor
13:30 - 14:40	Self-learning lessons discussion	All
14:40 - 14:45	Break
14:45 - 16:00	Integration	Radhika

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

Clustering

Click here for a preview of this lesson

From the UMAP visualization of our data we can see that the cells are positioned into groups. Our next task is to isolate clusters of cells that are most similar to one another based on gene expression.

In this lesson you will:
- Learn the theory behind clustering and how it is performed in Seurat
- Cluster cells and visualize them on the UMAP
Clustering quality control

Click here for a preview of this lesson

After separating cells into clusters, it is crtical to evaluate whether they are biologically meaningful or not. At this point we can also decide if we need to re-cluster and/or potentialy go back to a previous QC step.

In this lesson you will:
- Check to see that clusters are not influenced by uninteresting sources of variation
- Check to see whether the major principal components are driving the different clusters
- Explore the cell type identities by looking at the expression for known markers across the clusters.
Marker identification

Click here for a preview of this lesson

By this point, you have defined most of your clusters as representative populations of particular cell types. However, there may still some uncertanity and/or unknowns. This step in workflow is about using the gene expression data to identify genes that exhibit a significantly higher (or lower) level of expression for a partcular cluster of cells.

In this lesson, we idenitfy these lists of genes and use them to:
- Verify the identity of certain clusters
- Help surmise the identity of any unknown clusters
Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Copy over your R code from the exercises to this (downloadable) R script
- Upload the saved R script file to Dropbox day before the next class.

Questions?

If you get stuck due to an error while runnning code in the lesson, email us
Post any conceptual questions that you would like to have reviewed in class here.

Day 3

Time	Topic	Instructor
13:30 - 14:30	Self-learning lessons discussion	All
14:30 - 14:40	Workflow summary	Radhika
14:40 - 14:45	Break
14:45 - 15:30	Discussion, Final Q & A	All
15:30 - 16:00	Wrap up	Jihe

Answer Keys

Answer key - assignment #1
Answer key - assignment #2

Downstream analyses

Differential expression between conditions

Resources

We have covered the analysis steps in quite a bit of detail for scRNA-seq exploration of cellular heterogeneity using the Seurat package. For more information on topics covered, we encourage you to take a look at the following resources:

Seurat vignettes
Seurat cheatsheet
Satija Lab: Single Cell Genomics Day
"Principal Component Analysis (PCA) clearly explained", a video from Josh Starmer
Additional information about cell cycle scoring
Using R on the O2 cluster
Highlighted papers for sample processing steps (pre-sequencing):
- "Sampling time-dependent artifacts in single-cell genomics studies." Massoni-Badosa et al. 2019
- "Dissociation of solid tumor tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses." O'Flanagan et al. 2020
- "Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows." Denisenko et al. 2020
Azimuth reference-based analysis
CellMarker resource
Highlighted papers for single-nuclei RNA-seq:
- Single-nucleus and single-cell transcriptomes compared in matched cortical cell types
- A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors
Ligand-receptor analysis with CellphoneDB

Building on this workshop

Other online scRNA-seq courses:
Resources for scRNA-seq Sample Prep:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HSPH_schedule.md

HSPH_schedule.md

Schedule for the single-cell RNA-seq data analysis workshop (HSPH)

Pre-reading

Day 1

Before the next class:

Questions?

Day 2

Before the next class:

Questions?

Day 3

Answer Keys

Downstream analyses

Resources

Building on this workshop

Other helpful links

Files

HSPH_schedule.md

Latest commit

History

HSPH_schedule.md

File metadata and controls

Schedule for the single-cell RNA-seq data analysis workshop (HSPH)

Pre-reading

Day 1

Before the next class:

Questions?

Day 2

Before the next class:

Questions?

Day 3

Answer Keys

Downstream analyses

Resources

Building on this workshop

Other helpful links