IM-ML

A machine learning workflow to predict gene regulon membership based on promoter sequence features, focusing on top-down regulons derived from an Independent Component Analysis (ICA) of the PRECISE E. coli RNAseq database.

What is Independent Component Analysis?

To learn about ICA, how ICA components are computed, and what they can tell you, please visit https://imodulondb.org/about.html

Workflow outline

Generate SigmaFactor PSSMs
Feature Matrix Generation (This generates a ~200MB file necessary for machine learning)
Feature Engineering
Machine learing: model training and hyperparameter optimization
ArcA Direct Repeats motifs to improve model performance

Dependencies

The workflow depends on:

bitome: https://github.com/SBRG/bitome
pymodulon: https://github.com/SBRG/pymodulon
DNAshapeR:https://github.com/TsuPeiChiu/DNAshapeR
scikit-learn: https://scikit-learn.org/stable/
seaborn statistical data visualization:https://seaborn.pydata.org/index.html

Recommended package versions are:
Python==3.8
seaborn==0.12.2
numpy==1.24.3
matplotlib==3.7.1
pandas==1.5.3
biopython==1.78

Citation

Qiu, S., Lamoureux, C., Akbari, A., Palsson, B. O., & Zielinski, D. C. (2022). Quantitative sequence basis for the E. coli transcriptional regulatory network. https://doi.org/10.1101/2022.02.20.481200

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bitome-kb		bitome-kb
data		data
workflow		workflow
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IM-ML

What is Independent Component Analysis?

Workflow outline

Dependencies

Citation

About

Releases

Packages

Contributors 4

Languages

SBRG/IM-ML

Folders and files

Latest commit

History

Repository files navigation

IM-ML

What is Independent Component Analysis?

Workflow outline

Dependencies

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages