Skip to content

SBRG/IM-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

IM-ML

A machine learning workflow to predict gene regulon membership based on promoter sequence features, focusing on top-down regulons derived from an Independent Component Analysis (ICA) of the PRECISE E. coli RNAseq database.

What is Independent Component Analysis?

To learn about ICA, how ICA components are computed, and what they can tell you, please visit https://imodulondb.org/about.html

Workflow outline

  1. Generate SigmaFactor PSSMs
  2. Feature Matrix Generation (This generates a ~200MB file necessary for machine learning)
  3. Feature Engineering
  4. Machine learing: model training and hyperparameter optimization
  5. ArcA Direct Repeats motifs to improve model performance

Dependencies

The workflow depends on:

  1. bitome: https://github.com/SBRG/bitome
  2. pymodulon: https://github.com/SBRG/pymodulon
  3. DNAshapeR:https://github.com/TsuPeiChiu/DNAshapeR
  4. scikit-learn: https://scikit-learn.org/stable/
  5. seaborn statistical data visualization:https://seaborn.pydata.org/index.html

Recommended package versions are:
Python==3.8
seaborn==0.12.2
numpy==1.24.3
matplotlib==3.7.1
pandas==1.5.3
biopython==1.78

Citation

Qiu, S., Lamoureux, C., Akbari, A., Palsson, B. O., & Zielinski, D. C. (2022). Quantitative sequence basis for the E. coli transcriptional regulatory network. https://doi.org/10.1101/2022.02.20.481200

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •