Skip to content

cRegulon is an optimization model to identify combinatorial regulon from single cell expression and chromatin accessibility data.

License

Notifications You must be signed in to change notification settings

fengzhanying/cTOP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cRegulon

cRegulon is an optimization model to identify combinatorial regulon from single cell expression and chromatin accessibility data.

Introduction

This is cRegulon software: an optimization model to identify combinatorial regulon from single cell expression and chromatin accessibility data.

Requirements:

  1. Python >=3.0 with packages: numpy, sklearn, and scipy
  2. matlab >= 2021
  3. Homer

Installing cRegulon with the following command:

wget https://github.com/fengzhanying/cRegulon/archive/master.zip
unzip master.zip
cd cRegulon-master
wget https://www.dropbox.com/s/0h1wxlu7iqheajo/cRegulon.tar.gz
tar -xzvf cRegulon.tar.gz

Step 1: single cell data preprocessing

The typic input file (RAd4_scRNA.txt) of scRNA-seq data is a gene by cell count matrix:

scRNA Cell1 Cell2 Cell3
Gene1 5 0 3
Gene2 0 2 0
Gene3 1 0 0
The typic input file (RAd4_scATAC.txt) of scATAC-seq data is a peak by cell count matrix:
scATAC Cell1 Cell2 Cell3 Cell4
Peak1 1 0 1 0
Peak2 0 1 0 1
Peak3 1 0 0 0
The peaks are in the format of "chr_start_end".
We run the following script to make the gene expression matrix and gene activity matrix:
source Preprocessing.sh RAd4

This process will produce gene expression file (RAd4_GE.txt) and gene activity file (RAd4_GA.txt)

Step 2: Constructing TF-TG regulatory network by pseudo-bulk strategy

With the input files are (RAd4_scRNA.txt) and (RAd4_scATAC.txt), we run the following script:

source PS_PECA.sh RAd4 mm10

This process will produce the TF-REs-TG triplets files (RAd4_network.txt) and TF-TG regulatory strength file (RAd4_TRS.txt).

Step 3: Constructing TF-TF combinatorial network

With the input TF-TG regulatory strength file (RAd4_TRS.txt), we run the following script:

source runCSI.sh RAd4

This will generate normalized TF-TG regulatory strength file (RAd4_TRS.txt) and TF-TF combinatorial network (RAd4_CSI.txt).

Step 4: Running cRegulon model

With the input of TF-TF combinatorial network (C.txt), normalized TF-TG regulatory strength matrix (R.txt), gene expression matrix (GE.txt), and gene activity matrix (GA.txt), we run the following cRegulon model:

source cRegulon.sh RAd4

This will output:

  1. TF combinatorial effects in all cRegulons: X.txt
  2. cRegulon combination coefficients for scRNA-seq: H1.txt
  3. cRegulon combination coefficients for scATAC-seq: H2.txt
  4. TF modules of cRegulons: TFs (*TF.txt) and TF pairs (*TFPair.txt).
  5. Regulatory sub-network of each cRegulon: *SubNet.txt

Citation:

If you use cRegulon software or cRegulon associated concepts, please cite

Zhanying Feng, et al. Modeling combinatorial regulon from single cell gene expression and chromatin accessibility data. 2023.

About

cRegulon is an optimization model to identify combinatorial regulon from single cell expression and chromatin accessibility data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published