Skip to content

Latest commit

 

History

History
98 lines (65 loc) · 3.49 KB

README.md

File metadata and controls

98 lines (65 loc) · 3.49 KB

SCEVT LOGO

SCEVT is a tool to easily visualize and analyze scaffolds during de-novo genome assembly.

SCEVT consists of two scripts:

  • scaal.py
  • scaphy.py

scaphy.py (Scaffold to Physical Reference Mapping)

scaphy is a tool to visualize scaffolds in relation to a reference genome assembly. Specifically, it draws gaps within the scaffolds (esspecially helpful for BioNano assisted scaffolds) and draws mappings to a reference chromosome whenever the genes match. It also highlights when a gene is on the scaffold that is not on the specified chromosome on the reference genome (meaning you have probably anchored a new contig).

Here is an example output: example scaphy output

How to use

scaco.py (Scaffold Comparison)

[How it Works] scaco directly compares two scaffolds based on gene annotations. It highlights and maps which genes are similar on the two scaffolds, and also highlights which genes are present on one but not the other. Additionally, it also plots the gaps within the scaffolds. This is useful for comparing haplotype contigs of a de-novo assembly.

Here is an example output: example scaco output

How to use

Installation

Getting the Files

# Go to where you want to have this tool
cd path/to/Project/directory
git clone https://github.com/pbieberstein/SCEVT.git SCEVT

Installing Python & Dependencies

This script was developed in Python 2.7

The easiest way : Install anaconda for python 2.7 on your local machine and then install biopython via:

conda install biopython matplotlib==1.5.3 pandas
conda install --channel bioconda gffutils
# gmap for creating the feature mapping (output needs to be set to BLAT)
conda install -c bioconda gmap

Alternatively, if you want to stay organized we recommend you install miniconda and then create a new virtual environment with the dependencies for this project. (https://conda.io/docs/install/quick.html) (Additional conda help: https://conda.io/docs/_downloads/conda-cheatsheet.pdf)

cd path/to/Project/directory
conda create --prefix ./scevt-env biopython matplotlib==1.5.3 pandas biopython
# This creates a new environment with biopython and matplotlib installed inside the folder "scevt_env"

**It's important to use matplotlib 1.5.3 otherwise SCEVT will run very slowly

Now when you want to run SCEVT, you'll first have to activate this new python environment via:

source activate scevt-env/bin/activate

Now open up a new terminal window to update the PATHs and now you're ready to run scaal and scaphy

Then you can run the tools via

cd path/to/Project/directory/SCEVT
cd Scripts
python scaal.py
# or
python scaphy.py

Progress:

  • scaal.py script is DONE #c5f015
  • scaphy.py script is DONE #c5f015
  • Documentation is DONE #c5f015

This tool was written to assist in a de-novo genome assembly project at ETH-Zurich

It is not activily maintained but it should still be useful. If you have any questions/ideas/concerns, contact me.