Skip to content

A novel improved pipeline for riboswitch classification based on SMOTE etc.

Notifications You must be signed in to change notification settings

solshiferaw/Riboswitch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Riboswitch

In a last decade, RNA sequencing technology and computational methodology have generated huge impetus to riboswitch research.

One of the main challenges raised during classification of riboswitch was imbalanced data.

Previous published classifers all base on untreated imbalanced data, which leads to ignore minority group and emphasize on majority class, consequential return a skewed performance.

This repository includes parts of Machine learning model selection and Performance evaluation (Sensitivity, Specificity and Accuracy, F-score).

Workflow

workflow

Tutorials

model _selection.ipynb

  1. Read in cleaning riboswitch-kmers matrix csv file as following format:

    class kmer1 kmer2 ... kmer N
    Family name 1 k-mer counting
    Family name 2 ...
    ... ...
    Family name M k-mer counting
  2. Generate fixed training set and test set and preserve them in home direction

  3. 10 Fold CV applied in six algorithms to get relative best parameters. The script will preserve all best models, both balanced models and imbalanced models in Model folder.

classification report.ipynb

  1. load models generated by model_selection.ipynb
  2. load training set and testing set
  3. generate classification report in automaticly created folder classification report

figures.ipynd

  1. load models generated by model_selection.ipynb
  2. generate confusion matrix and other figures. All preserved in Figures folder.

The following three ipynb files have other uses and not necessary to the workfolw:

ribo_colormap_produce_kmerfamily.ipynb

ribo_colormap_input_txt.ipynb

feature_selection

Dependencies

Python

seaborn==0.9.0

pandas==0.24.2

PDPbox==0.2.0

shap==0.29.1

numpy==1.16.2

imbalanced_learn==0.4.3

matplotlib==3.0.3

ipython==7.5.0

imblearn==0.0

scikit_learn==0.21.2

Method of installing above packages:

  1. change directory to the project's home directory which exists the file "requirements.txt"
  2. entering
pip install -r requirements.txt

About

A novel improved pipeline for riboswitch classification based on SMOTE etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%