ML4GLand is a community for that develops and maintains tools (primarily in Python) for genomics sequence based machine learning.
Deep learning has become a popular tool for investigating gene regulation, including DNA and RNA protein binding specificity, chromatin state and architecture, and transcriptional activity. However, executing a typical workflow for building and interpreting deep learning models remains a challenge. Training nuances specific to genomics data along with complex preprocessing and interpretation methods create an especially high learning curve, and heterogeneity in implementations of most code associated with publications hinders reproducibility and extensibility. A tool for exposing existing data, models and methods to computational scientists, that can also serve as a platform for development, will greatly improve our ability to use sequence-based machine learning to interrogate gene regulatory mechanisms.
⭐ We aim to build a framework for developing sequence-to-function deep learning models
Previous work has shown the utility of such frameworks. DeepChem and scverse are excellent examples. Our mission is to put together a similar ecosystem for sequence based genomics.
- SeqPro -- a Python package for processing DNA/RNA sequences for machine learning.
- SeqData -- a Python package for preparing machine learning-ready genomic sequence datasets.
- SeqExplainer -- a Python package for interpreting sequence-to-function machine learning models.
- EUGENe -- a Python package for streamlining and customizing end-to-end deep-learning sequence analyses in regulatory genomics.
- SeqDatasets -- a repository for downloading datasets and loading them with SeqData.
- MotifData -- a Python package for handling motifs.