Welcome to the Scorpio project! This repository contains advanced tools for training triplet networks using contrastive learning on diverse DNA sequences and data from promoter detection, phylogenomic analysis, antimicrobial resistance (AMR) detection, and any hierarchical information, which can improve downstream analysis and insights.
The GitHub Wiki also contains tutorials to help you learn how to use Scorpio tools with real data.
For training the gene-taxa model with full genes, we have included the data in this Zenodo record: Zenodo. Please follow the instructions below to fully download the data. This data can be used with the trainer
to train and save the model:
wget https://zenodo.org/api/records/12175913/files-archive -O scorpio-gene-taxa.zip
unzip scorpio-gene-taxa.zip -d scorpio-gene-taxa
-
ScorpioBigDynamic
https://zenodo.org/record/14176840 -
ScorpioBigEmbed
https://zenodo.org/records/14176823
Our pre-trained model, MetaBERTa, a version of BigBird trained on gene sequences, is available here:
MetaBERTa-BigBird-Gene on Hugging Face
You can set up the environment for scorpio
using either a conda environment or a Docker image. Follow the instructions below for your preferred method:
-
Create a conda environment named
scorpio
based on the environment file in thesrc
directory:conda env create -f src/environment.yml -n scorpio
-
Activate the conda environment:
conda activate scorpio
-
Run the setup script to add
scorpio
to your PATH:./src/setup.sh
-
Download and run the Docker image:
docker pull eesilab/scorpio docker run -it eesilab/scorpio
After following the steps for either method, your environment should be set up and ready to use the scorpio
tool.
We encourage community involvement and welcome contributions to the Scorpio project (Report Issues, Submit Pull Requests, Join Discussions).
Please email us for any inquiries.
Maintainer: Saleh Refahi (sr3622 at drexel dot edu)
Owner: Gail Rosen (gailr26 at drexel dot edu)