The Virus SNP Analysis Tool(VSAT) is tailored for data generated by first-generation sequencing and serves as a concise yet powerful toolkit aimed at streamlining the study of single nucleotide polymorphisms (SNPs) in viral genomes using first-generation sequencing techniques.
- Set up your project directory
Begin by creating a directory for your project and navigating into it:
mkdir your_project
cd your_project
- Create a directory for raw data
Create a directory named rawdata to store your first-generation sequencing results(e.g., files with extensions like *.seq, *.ab1, *.pdf):
mkdir rawdata
- Prepare the ID map file
Create an ID map file namedid_map.xls
. Populate this file with the following format:
-1_ sample1
-2_ sample2
-3_ sample3
-4_ sample4
- Run the sequence assembly
Copy the assembly script located atvsat/step1.run_cap3_assemble.sh
into your project directory. Modify parameters as necessary, then execute the script to perform sequence assembly:
python first_gen_snp/vsat/cap3_assemble.py \
--rawdata /path_to/your_project/rawdata/ \
--split_data /path_to/your_project/split_data \
--assemble_dir /path_to/your_project/assemble \
--id_map /path_to/your_project/id_map.xls
- Post-assembly processing
After assembly, you will find the assembled sequences in the directoryassemble//.cap.contigs.split
. Download these sequences to your local machine for alignment using software like DNAMAN. Combine the alignment results of *.ab1 files with the reference viral genome to assist in comprehensive assembly. This entire process will result in the complete viral genome.
Once you have completed the sequence assembly, you can proceed to call SNPs using the following command:
python vsat/get_snp.py -g <LOCUS, input> -s <ASSEMBLY, input>