-
Notifications
You must be signed in to change notification settings - Fork 23
How to build a customized index
From version 0.3, SalmonTE
supports build a customized index! If you want to build your own, then please follow steps below.
The reference FASTA file of repeat elements is mandatory to build the index. You can download it from RepBase or create it by yourself.
If you decide to download the FASTA file, then follow steps below (Please refer below figure):
- Select
Repeat class
based on your interest. - Set the
Output format
asFASTA
- Select
Include elements
- Select your species from
Taxon
. - Click the
Download
button on the line starts with Your species only
- Note: Click another
Download
button if you are willing to add ancestral repeats.
Note: You also have to create your account of RepBase. Please visit This link to have the account.
A user reports us that FASTA file of repeat sequences for some species is not available to download in RepBase, but without the reference FASTA files, you are not able to use SalmonTE
. We have not tried to create it, but we believe that it will be helpful to see a link to create the file. Once you create the file then the name of each sequence in the file must be like below:
>B1 SINE1/7SL
gccgggcatggtggcgcacgcctttaatcccagcacttgggaggcagaggcaggcggatttctgagttcg
aggccagcctggtctacanagtgagttccaggacagccagggctacacagagaaaccctgtctcg
Please be aware there is a tab character (white-space) between B1
and SINE1/7SL
. In other words, the repeat name and the class name in the line has to be separated by the tab character.
Like the above example, the first word must be a name of the sequence, and another word must be an element from the hierarchy of classes in Censor. We only accepted an element which is annotated in the database. You can see the list of the classes from this link.
Please report us if you succeed to generate it, and we can help you if you have any problem or question regarding it.
With the FASTA file, then run below command-line to generate the index
./SalmonTE.py index --input_fasta=/Users/hwan/Downloads/dr.fa --ref_name=dr --te_only
Here is an explanation for each parameter
-
--input_fasta
- Should be a path of the FASTA file -
--ref_name
- It will be the abbreviation of the reference, and use the name you put here as the value of the--ref_name
parameter inquant
mode. -
--te_only
- This index will be built withTransposable Elements
if the option is enabled, otherwise all of the repeat sequences in the FASTA file will be considered to be in the reference index.
Note: If you want to customize the annotation of classes/clades, then please edit clades_extended.csv
in reference
folder.