Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.7 KB

README.md

File metadata and controls

29 lines (22 loc) · 1.7 KB

Project Description

This project demonstrates the application of Machine Learning, specifically deep neural networks, in identifying transcription-factor binding sites in DNA sequences. Utilizing a synthetic dataset of DNA sequences, the goal is to predict the presence of a specific transcription factor indicated by labels (1 or 0) using a developed machine learning model.

Dataset

The dataset consists of DNA sequences that are either regulated by a specific transcription factor (label 1) or not (label 0).

Methodology

Data Loading and Preprocessing: The data is loaded and preprocessed using one-hot encoding to transform the DNA sequences into a matrix format, suitable for machine learning algorithms.

Model Building:

The project involves building and training a deep neural network, with convolutional layers to detect motifs indicative of binding sites.

Model Evaluation:

The model is evaluated on a test dataset, with performance metrics such as accuracy and a confusion matrix being computed.

Saliency Map Analysis:

To interpret the model, saliency maps are computed to identify which parts of the DNA sequences most influence the model's predictions.

Implementation

The repository contains a Jupyter Notebook with detailed steps and code for data preprocessing, model building, and evaluation. Users can experiment with different model architectures and parameters.

Requirements

Python Libraries: TensorFlow, Keras, Pandas, Scikit-Learn, Matplotlib, Seaborn

Instructions

Clone the repository. Install required libraries. Run the Jupyter Notebook, following the instructions and completing the incomplete cells. Experiment with modifying the model and preprocessing steps to explore different outcomes.