Skip to content

A Nextflow pipeline for the CombiGEM-CRISPR screening platform.

License

Notifications You must be signed in to change notification settings

yuukiiwa/CombiPIPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CombiPIPE

Analysis pipelines for combinatorial screens done using the CombiGEM-CRISPR and CombiSEAL (coming soon) platforms.

Before running

Please install DescTools on RStudio. Then go to terminal (any command line window) and install regex, scipy, and numpy for python. If python3, simply use pip3 instead of pip.

pip install regex
pip install scipy
pip install numpy

Please install nextflow

curl -s https://get.nextflow.io | bash
export PATH=$PATH:</path/to/file>

Download JDK 8 from here A dmg or exe file serves well. Open once downloaded, then follow the instructions on the installer.

export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)

Generate a sample info. csv file containing sample index sequences, sample names, and conditions:

GATCAATGTTC,SA162,1
CGATCTGGCGAA,SA163,1
TCGTTCCTG,SA164,0
ATCAGAACAT,SA165,0

Generate a barcode csv file containing the barcodes and keys:

AAGCGAGT,1
CTCTAGGT,2

Running the pipelines

The first three processes are the same in both pipelines.

  1. extract samples
  2. extract barcodes
  3. calculate lg Fold Change and -log10 P value between the initial and experimental group.

For CombiGEM (supports screens with two or more dimensions)

  1. calculate gene-level genetic interaction
  2. calculate Dunnett test p-values for each gene-level combinations
nextflow CombiGEM.nf \
      --fastq <fastq> \
      --sampinfo <sampleInfo.csv> \
      --barcodes <barcode_list.csv> \
      --pattern <1st 7 letters from the fastq file> \ 
      --dimensions <number of gRNAs> \ 
      --linker <barcode-connecting sequence>
      --linker <dummy sgRNA names separated by "," (e.g.: 1,2)>

For CombiSEAL (coming soon)

  1. epistasis calculation (based on FACS sorting data)

Try running the pipeline with the test-dataset

The test dataset is a minimal dataset that contains 3 percent of all reads from the ovarian cancer 3-way analysis described in the Cell Reports study by Zhou et al.

  1. clone/download this directory to your computer
  2. get to the downloaded directory from your command line window
  3. type gunzip test-dataset/OVARIAN_testdataset.fastq.gz
  4. run the pipeline with nextflow by typing the following:
nextflow CombiGEM.nf --fastq test-dataset/OVARIAN_testdataset.fastq --sampinfo test-dataset/sampleInfo.csv --barcodes test-dataset/barcode_list.csv --pattern @D00691 --dimensions 3 --linker CAATTC --dummysgs 1,2

If ran successfully, these are what you will see:

Publications

CombiGEM-CRISPR

  1. https://www.nature.com/articles/nbt.3326
  2. https://www.pnas.org/content/113/9/2544.short
  3. https://www.cell.com/cell-reports/fulltext/S2211-1247(20)31005-6#secsectitle0070

CombiSEAL

  1. https://www.nature.com/articles/s41592-019-0473-0

About

A Nextflow pipeline for the CombiGEM-CRISPR screening platform.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published