Skip to content

Commit

Permalink
Merge pull request #118 from ARTbio/update_IOC_singlecell
Browse files Browse the repository at this point in the history
Update IOC single cell
  • Loading branch information
bellenger-l authored Apr 24, 2024
2 parents c9e0386 + 023f75f commit 7f7f36f
Show file tree
Hide file tree
Showing 27 changed files with 393 additions and 114 deletions.
57 changes: 57 additions & 0 deletions docs/scRNAseq_basics/00_IOCsc_week0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## First Steps with Seurat

Please go read the following pages :

- [Introduction to Single-Cell RNAseq analysis](introduction.md)
- [Initialization of R analysis](intro_seurat.md)
- [Import data and intialization of Seurat object](import.md)

---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**

Now it's your time to shine ! We are going to put into pratice what we
have just seen. By using a more complicated use case, you are going to
reproduce the whole scRNAseq analysis with Seurat.

# Dataset test

The dataset for this analysis will be single cell RNAseq from zebrafish embryos
from [Metikala *et al*](https://doi.org/10.1371/journal.pone.0254024). You can download
the dataset at the GEO accession [GSE152982](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152982).

??? question "Do you need help to find the data ?"
??? tip "First tip : "
Look to supplementary file....
??? tip "Second tip : "
To see what's inside the tar archive you can click on `(custom)`

Once you download the data, all you have to do is import it onto the Rstudio server.

## Biomart is your friend

Don't forget to import biomaRt in order to help you annotate your genes.
Make sure to tweak parameters to fit this new dataset.

!!! question "Trouble to use biomaRt ?"
Here is some links to help you with biomaRt :

- [Vignette of the R package](https://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/accessing_ensembl.html)
- [Short presentation](https://docs.google.com/presentation/d/1ck41d_0a6bMEreTfeeES67RExEc2pp_OXQvqbJ3ZdhU/edit?usp=sharing) of the use of the R package, comparing it with the Ensembl interface


# Render your RMD/QMD

To complete this week you'll need to :

- [x] 1. Retrieve the zebrafish dataset
- [x] 2. Import data in Rstudio
- [x] 3. Import data in your global environment
- [x] 4. Create a Seurat Object
- [x] 5. Create an annotation table of zebrafish genes using `biomaRt`.

Add your RMD/QMD in your Trello card.

**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
46 changes: 46 additions & 0 deletions docs/scRNAseq_basics/01_IOCsc_week1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
## Data Preprocessing

The preprocessing is the most important part of a single cell analysis because you
can skew your result if you filter too much **or too little** and you must really
understand what's going on these steps.

The preprocesing is composed of:

- Filtering of low quality barcodes
- Barcode normalization
- Selection of most variable features

Please go read the [preprocessing](preprocessing.md) pages to learn more about it.

---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**

Now it's your time to shine ! We are going to put into pratice what we
have just seen. By using a more complicated use case, you are going to
reproduce the whole scRNAseq analysis with Seurat.

# Dataset test

The dataset for this analysis will be single cell RNAseq from zebrafish embryos
from [Metikala *et al*](https://doi.org/10.1371/journal.pone.0254024). You need
to continue your RMD/QMD from last week.

# Render your RMD/QMD

To complete this week you'll need to :

- [x] 1. Filtering the low quality barcodes **and explain each cutoff**
- [x] 2. Normalize the data
- [x] 3. Identify the most variable genes

!!! warning "IMPORTANT"
Please note you **must** be explicative in your cutoff choices and detailled
each step of your thoughts.
In general, try to explain in your own words, each step of your analysis !

Add your RMD/QMD in your Trello card.

**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
13 changes: 13 additions & 0 deletions docs/scRNAseq_basics/02_IOCsc_week2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@



---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**




**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
13 changes: 13 additions & 0 deletions docs/scRNAseq_basics/03_IOCsc_week3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@



---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**




**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
13 changes: 13 additions & 0 deletions docs/scRNAseq_basics/04_IOCsc_week4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@



---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**




**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
13 changes: 13 additions & 0 deletions docs/scRNAseq_basics/05_IOCsc_week5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@



---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**




**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
13 changes: 13 additions & 0 deletions docs/scRNAseq_basics/06_IOCsc_week6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@



---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**




**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
13 changes: 13 additions & 0 deletions docs/scRNAseq_basics/07_IOCsc_week7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@



---

![](../R-IOC/images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!**




**Thank you for your attention and see you next week :clap: :clap: :clap:**

----
2 changes: 1 addition & 1 deletion docs/scRNAseq_basics/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ represents our cell populations.
pbmc_small <- FindNeighbors(pbmc_small, #SeuratObject
reduction = "pca", #Reduction to used
k.param = 20,
dims = 1:10) #Number of PCs to keep (previously determined)
dims = 1:pc_to_keep) #Number of PCs to keep (previously determined)

pbmc_small <- FindClusters(pbmc_small, #SeuratObject
resolution = seq(from = 0.2, to = 1.2, by = 0.2), #Compute clustering with several resolutions (from 0.2 to 1.2 : values usually used)
Expand Down
Binary file modified docs/scRNAseq_basics/images/Clustree-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/ElbowPlot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/MitoGenes-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/MitoGenes-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/PCA-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/QCFilter-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/SetIdents-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/UMAP-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/VariableFeature-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/plotJackStraw-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/visualMarkers-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/scRNAseq_basics/images/visualMarkers-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
115 changes: 60 additions & 55 deletions docs/scRNAseq_basics/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,31 +232,41 @@ pbmc_small <- CreateSeuratObject(tenX_matrix, #Expression mat

## Formal class 'Seurat' [package "SeuratObject"] with 13 slots
## ..@ assays :List of 1
## .. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
## .. .. .. ..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. .. .. .. .. ..@ i : int [1:2286884] 70 166 178 326 363 410 412 492 494 495 ...
## .. .. .. .. .. ..@ p : int [1:2701] 0 781 2133 3264 4224 4746 5528 6311 7101 7634 ...
## .. .. .. .. .. ..@ Dim : int [1:2] 32738 2700
## .. .. .. .. .. ..@ Dimnames:List of 2
## .. .. .. .. .. .. ..$ : chr [1:32738] "ENSG00000243485" "ENSG00000237613" "ENSG00000186092" "ENSG00000238009" ...
## .. ..$ RNA:Formal class 'Assay5' [package "SeuratObject"] with 8 slots
## .. .. .. ..@ layers :List of 1
## .. .. .. .. ..$ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. .. .. .. .. .. ..@ i : int [1:2286884] 70 166 178 326 363 410 412 492 494 495 ...
## .. .. .. .. .. .. ..@ p : int [1:2701] 0 781 2133 3264 4224 4746 5528 6311 7101 7634 ...
## .. .. .. .. .. .. ..@ Dim : int [1:2] 32738 2700
## .. .. .. .. .. .. ..@ Dimnames:List of 2
## .. .. .. .. .. .. .. ..$ : NULL
## .. .. .. .. .. .. .. ..$ : NULL
## .. .. .. .. .. .. ..@ x : num [1:2286884] 1 1 2 1 1 1 1 41 1 1 ...
## .. .. .. .. .. .. ..@ factors : list()
## .. .. .. ..@ cells :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
## .. .. .. .. .. ..@ .Data: logi [1:2700, 1] TRUE TRUE TRUE TRUE TRUE TRUE ...
## .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. .. .. .. ..$ : chr [1:2700] "AAACATACAACCAC-1" "AAACATTGAGCTAC-1" "AAACATTGATCAGC-1" "AAACCGTGCTTCCG-1" ...
## .. .. .. .. .. .. .. ..$ : chr "counts"
## .. .. .. .. .. ..$ dim : int [1:2] 2700 1
## .. .. .. .. .. ..$ dimnames:List of 2
## .. .. .. .. .. .. ..$ : chr [1:2700] "AAACATACAACCAC-1" "AAACATTGAGCTAC-1" "AAACATTGATCAGC-1" "AAACCGTGCTTCCG-1" ...
## .. .. .. .. .. ..@ x : num [1:2286884] 1 1 2 1 1 1 1 41 1 1 ...
## .. .. .. .. .. ..@ factors : list()
## .. .. .. ..@ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. .. .. .. .. ..@ i : int [1:2286884] 70 166 178 326 363 410 412 492 494 495 ...
## .. .. .. .. .. ..@ p : int [1:2701] 0 781 2133 3264 4224 4746 5528 6311 7101 7634 ...
## .. .. .. .. .. ..@ Dim : int [1:2] 32738 2700
## .. .. .. .. .. ..@ Dimnames:List of 2
## .. .. .. .. .. .. ..$ : chr "counts"
## .. .. .. ..@ features :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
## .. .. .. .. .. ..@ .Data: logi [1:32738, 1] TRUE TRUE TRUE TRUE TRUE TRUE ...
## .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. .. .. .. ..$ : chr [1:32738] "ENSG00000243485" "ENSG00000237613" "ENSG00000186092" "ENSG00000238009" ...
## .. .. .. .. .. .. .. ..$ : chr "counts"
## .. .. .. .. .. ..$ dim : int [1:2] 32738 1
## .. .. .. .. .. ..$ dimnames:List of 2
## .. .. .. .. .. .. ..$ : chr [1:32738] "ENSG00000243485" "ENSG00000237613" "ENSG00000186092" "ENSG00000238009" ...
## .. .. .. .. .. .. ..$ : chr [1:2700] "AAACATACAACCAC-1" "AAACATTGAGCTAC-1" "AAACATTGATCAGC-1" "AAACCGTGCTTCCG-1" ...
## .. .. .. .. .. ..@ x : num [1:2286884] 1 1 2 1 1 1 1 41 1 1 ...
## .. .. .. .. .. ..@ factors : list()
## .. .. .. ..@ scale.data : num[0 , 0 ]
## .. .. .. ..@ key : chr "rna_"
## .. .. .. ..@ assay.orig : NULL
## .. .. .. ..@ var.features : logi(0)
## .. .. .. ..@ meta.features:'data.frame': 32738 obs. of 0 variables
## .. .. .. ..@ misc : list()
## .. .. .. .. .. .. ..$ : chr "counts"
## .. .. .. ..@ default : int 1
## .. .. .. ..@ assay.orig: chr(0)
## .. .. .. ..@ meta.data :'data.frame': 32738 obs. of 0 variables
## .. .. .. ..@ misc :List of 1
## .. .. .. .. ..$ calcN: logi TRUE
## .. .. .. ..@ key : chr "rna_"
## ..@ meta.data :'data.frame': 2700 obs. of 3 variables:
## .. ..$ orig.ident : Factor w/ 1 level "PBMC analysis": 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ nCount_RNA : num [1:2700] 2421 4903 3149 2639 981 ...
Expand All @@ -271,20 +281,14 @@ pbmc_small <- CreateSeuratObject(tenX_matrix, #Expression mat
## ..@ project.name: chr "PBMC analysis"
## ..@ misc : list()
## ..@ version :Classes 'package_version', 'numeric_version' hidden list of 1
## .. ..$ : int [1:3] 4 1 0
## .. ..$ : int [1:3] 5 0 1
## ..@ commands : list()
## ..@ tools : list()

=== "Dimension of expression matrices"

``` r
dim(pbmc_small@assays$RNA@counts)
```

## [1] 32738 2700

``` r
dim(pbmc_small@assays$RNA@data)
dim(pbmc_small[["RNA"]])
```

## [1] 32738 2700
Expand All @@ -306,34 +310,35 @@ pbmc_small <- CreateSeuratObject(tenX_matrix, #Expression mat

We can observe several slots via the `str` command:

- `assays`: general slot that will include the different information of each
study. They are composed of several things:
- the starting expression matrix (`@counts`), usually raw counts or raw UMI
- the one that will "undergo" all the modifications (filters,
normalization, etc) (`@data`)
- dataframe that will be created when scaling the data (`@scale_data`)
- prefix used for each calculation that will use this assay (study)
(`@key`)
- vector of gene names that will be determined to have a variable
expression (`@var.features`)
- dataframe associated with genes with different metadata
(`@meta.features`)

- `meta.data` : gathers all the information about the cells. At the beginning
Seurat will calculate the size of the library (nCount_RNA, or the total
number of UMI) and the number of detected genes (nFeatures_RNA) for each
cell. If a dataframe is given in the `meta.data` parameter of the
`CreateSeuratObject` function, its columns will be added after those
calculated by Seurat.

- `active.assay` : study used by default
- `active.ident` : default cell identity, here the name of the given project,
also stored under the column `orig.ident` in the metadata
- `assays`: general slot that will include the different information
of each study. It’s a list where each element is linked to a
expression data type. For instance in the element `RNA` you retrieve
all information link to you scRNAseq. But for example, if you have a
more complicated study you can also have `ADT` or `peaks` if you
perform a CITEseq, scATACseq or Multiome. Here we have a simple
scRNAseq, so by default `assays` will be composed of only the
element `RNA`. It’s class is `Assay5` you can have a great
description in the documentation of the `SeuratObject` package. In
you console `?Assay5` to read it.
- `meta.data` : gathers all the information about the cells. At the
beginning Seurat will calculate the size of the library (nCount_RNA,
or the total number of UMI) and the number of detected genes
(nFeatures_RNA) for each cell. If a dataframe is given in the
`meta.data` parameter of the `CreateSeuratObject` function, its
columns will be added after those calculated by Seurat.
- `active.assay` : study used by default
- `active.ident` : default cell identity, here the name of the given
project, also stored under the column `orig.ident` in the metadata

!!! note
Navigation in the different slots is done via `@` or `$`. Each main slot is
accessible via the `@`, *i.e.* `object@main slot` to go further in the
slots tree, most often complex objects are accessible with a `@` (dgCMatrix,
dataframe) and lists, vectors are accessible via `$`. If in doubt, you can
refer to the result of the `str` command and use the character in front of
each slot name.
each slot name. If you want to look to the expression matrix you can go :

```r
# Retrieve data in an expression matrix RNA counts matrix
pbmc_small[["RNA"]]$counts[1:10,1:5]
```
12 changes: 6 additions & 6 deletions docs/scRNAseq_basics/intro_markers.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,12 @@ kable(head(pbmc_markers))

| | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | gene |
|:----------------|------:|-----------:|------:|------:|----------:|:--------|:----------------|
| ENSG00000137154 | 0 | 0.6674692 | 0.998 | 0.997 | 0 | 0 | ENSG00000137154 |
| ENSG00000144713 | 0 | 0.6134636 | 0.998 | 0.997 | 0 | 0 | ENSG00000144713 |
| ENSG00000112306 | 0 | 0.7002797 | 1.000 | 0.994 | 0 | 0 | ENSG00000112306 |
| ENSG00000177954 | 0 | 0.7041791 | 0.998 | 0.994 | 0 | 0 | ENSG00000177954 |
| ENSG00000164587 | 0 | 0.5988615 | 1.000 | 0.997 | 0 | 0 | ENSG00000164587 |
| ENSG00000118181 | 0 | 0.7149172 | 1.000 | 0.978 | 0 | 0 | ENSG00000118181 |
| ENSG00000137154 | 0 | 0.6952948 | 0.998 | 0.997 | 0 | 0 | ENSG00000137154 |
| ENSG00000144713 | 0 | 0.6444963 | 0.998 | 0.997 | 0 | 0 | ENSG00000144713 |
| ENSG00000112306 | 0 | 0.7383383 | 1.000 | 0.994 | 0 | 0 | ENSG00000112306 |
| ENSG00000177954 | 0 | 0.7437027 | 0.998 | 0.994 | 0 | 0 | ENSG00000177954 |
| ENSG00000164587 | 0 | 0.6352699 | 1.000 | 0.997 | 0 | 0 | ENSG00000164587 |
| ENSG00000118181 | 0 | 0.7973422 | 1.000 | 0.978 | 0 | 0 | ENSG00000118181 |

The result of this function is a dataframe with several columns:

Expand Down
Loading

0 comments on commit 7f7f36f

Please sign in to comment.