launch-dCas9

launch-dCas9 provides machine LeArning based UNified CompreHensive framework to predict gRNA impact from multiple perspectives, including cell fitness, wild-type abundance (gauging power potential), and gene expression in single cells.

launch-dCas9 provides functions including derive testing data predictions according to pre-trained model or derive trained model given training datasets.

Input files

Sequencing model - performs well on cell fitness task for gRNAs in enhancer regions and wild type counts task
Integrating Sequencing and annotation model - performs well on cell fitness task for gRNAs in enhancer regions and wild type counts task We accept csv with fixed column names ["protospacer",'OGEE_prop_Essential', 'deltagb','deltagh', "H3k27ac", "ATAC", "H3K4me3"] as either training data or testing data. The annotation columns have to be continuous value. launch-dCas9 use zero-imputation internally.

Evaluation

If you want to access the prediction performance in testing data, the outcome column have to be named "significance" in the testing data.

Installation

Start by grabbing this source codes:

git clone https://github.com/Wancen/launch-dCas9.git
cd launch-dCas9

Use python virutal environment with conda

conda env create -f environment.yml

or

conda create -n launch-dCas9 --file requirements.txt
conda activate launch-dCas9

Usage

Prediction

Have to specify model_path, test_path, test_filename, result_path, outcome

CNN

python -W ignore launch-dCas9.py \
    --model CNN \
    --model_path ./exampleData/ \
    --test_path ./exampleData/ \
    --test_filename test.csv\
    --result_path ./exampleData/ \
    --variant seq_anno \
    --outcome promoterFitness

XGBoost

python -W ignore launch-dCas9.py \
    --model XGBoost \
    --model_path ./exampleData/ \
    --test_path ./exampleData/ \
    --test_filename test.csv\
    --result_path ./exampleData/ \
    --variant seq_anno \
    --outcome promoterFitness

Get trained model

Have to specify model_path, train_path, train_filename, outcome

CNN

python -W ignore launch-dCas9.py \
    --model CNN \
    --model_path ./exampleData/ \
    --train_path /proj/milovelab/mu/dukeproj/data/dat_discovery/promoter/ \
    --train_filename wgCERES-gRNAs-k562-discovery-screen-pro_baseMean125-binary-1-train.csv\
    --variant seq_anno\
    --outcome promoterFitness

XGBoost

python -W ignore launch-dCas9.py \
    --model XGBoost \
    --model_path ./exampleData/ \
    --train_path /proj/milovelab/mu/dukeproj/data/dat_discovery/promoter/ \
    --train_filename wgCERES-gRNAs-k562-discovery-screen-pro_baseMean125-binary-1-train.csv\
    --variant seq_anno\
    --outcome promoterFitness

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
exampleData		exampleData
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
launch-dCas9.py		launch-dCas9.py
model.py		model.py
requirements.txt		requirements.txt
runCNN.py		runCNN.py
runXGBoost.py		runXGBoost.py
run_launch-dCas9.sh		run_launch-dCas9.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

launch-dCas9

Input files

Evaluation

Installation

Usage

Prediction

Get trained model

About

Releases

Packages

Languages

License

Wancen/launch-dCas9

Folders and files

Latest commit

History

Repository files navigation

launch-dCas9

Input files

Evaluation

Installation

Usage

Prediction

Get trained model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages