iPSC Pipeline is a pipeline developed with Snakemake to apply several single cell analysis tools into query single cell transcriptomic datasets. The pipeline requires one reference dataset, with known cell types in its metadata and one query dataset with a metadata file. The output of the pipeline is a series of figures that can be used to generate reports and/or scientific papers.
This pipeline was built for the final undergrad paper (TCC) of Giovanni Alberto Crestani. The paper will be available soon.
The only installation needed for this pipeline is Snakemake. To install Snakemake, go to Snakemake installation guide.
- Clone or download this repo and go to the directory.
git clone https://github.com/gacrestani/ipsc-pipeline
cd ipsc-pipeline
-
Copy the
singularity.img
Singularity container intoworkflow/envs/
. You can download the image here. -
Run the pipeline with
snakemake -c4 --use-singularity
You can change 4
to be the number of cores used by Snakemake.
The pipeline will run and generate the figures using the La Manno et al (2016) dataset.
Edit the configuration file config/config.yml
.
-
datasets
: the directory names of your datasets. There must be a different directory for each query/reference dataset pair. The directory must contain at the following four files:-
query_exp.rda
is therda
file of the expression matrix of your query dataset. This should be a dgCMatrix where columns are samples and rows are genes. -
query_meta.rda
is therda
file of the metadata of your query dataset. Therownames
must be samples. -
train_exp.rda
is therda
file of the expression matrix of your reference dataset. Its structure should be similar toquery_exp.rda
. -
train_meta.rda
is therda
file of the metadata of your reference dataset. Therownames
must be samples.
-
-
Cell_ID_colname
: column name of the sample IDs for the reference and query dataset. -
Cell_type_colname
: column name of the cell types for the reference dataset. -
Cell_description_colname
: column name of the cell grouping in the query dataset. This should be some metadata that classifies the samples into diferent groups. It is optional. -
fig_width
: width of generated figures, in pixels. -
fig_height
: height of generated figures, in pixels. -
fig_res
: resolution of generated figures, in ppi.
Additionaly, there are some tool-specifi configurations. For example, you can set a cell type to be focused on the singleCellNet query violin plot.
You can leave these options as-is if you'd like to first make sure the workflow runs without error on your machine before using your own dataset and custom parameters.
The singularity containter singularity.img
was generated from a Docker container named gacrestani/ipsc_pipeline
. The code to generate the Docker container is in workflow/envs/docker
. The code used to generate the Singularity container from the Docker container was:
sudo singularity build singularity.img docker-daemon://gacrestani/ipsc_pipeline:latest