Skip to content

How to build a customized index

Hyun-Hwan Jeong edited this page Oct 11, 2018 · 7 revisions

From version 0.3, SalmonTE supports build a customized index! If you want to build your own, then please follow steps below.

1. Download or generate the reference FASTA file.

The reference FASTA file of repeat elements is mandatory to build the index. You can download it from RepBase or create it by yourself.

Download from RepBase

If you decide to download the FASTA file, then follow steps below (Please refer below figure):

  1. Select Repeat class based on your interest.
  2. Set the Output format as FASTA
  3. Select Include elements
  4. Select your species from Taxon.
  5. Click the Download button on the line starts with Your species only
  • Note: Click another Download button if you are willing to add ancestral repeats.

Note: You also have to create your account of RepBase. Please visit This link to have the account.

Create a FASTA file

A user reports us that FASTA file of repeat sequences for some species is not available to download in RepBase, but without the reference FASTA files, you are not able to use SalmonTE. We have not tried to create it, but we believe that it will be helpful to see a link to create the file. Once you create the file then the name of each sequence in the file must be like below:

>B1	SINE1/7SL
gccgggcatggtggcgcacgcctttaatcccagcacttgggaggcagaggcaggcggatttctgagttcg
aggccagcctggtctacanagtgagttccaggacagccagggctacacagagaaaccctgtctcg

Please be aware there is a tab character (white-space) between B1 and SINE1/7SL. In other words, the repeat name and the class name in the line has to be separated by the tab character.

Like the above example, the first word must be a name of the sequence, and another word must be an element from the hierarchy of classes in Censor. We only accepted an element which is annotated in the database. You can see the list of the classes from this link.

Please report us if you succeed to generate it, and we can help you if you have any problem or question regarding it.

2. Running index mode

With the FASTA file, then run below command-line to generate the index

./SalmonTE.py index --input_fasta=/Users/hwan/Downloads/dr.fa --ref_name=dr --te_only

Here is an explanation for each parameter

  • --input_fasta - Should be a path of the FASTA file
  • --ref_name - It will be the abbreviation of the reference, and use the name you put here as the value of the --ref_name parameter in quant mode.
  • --te_only - This index will be built with Transposable Elements if the option is enabled, otherwise all of the repeat sequences in the FASTA file will be considered to be in the reference index.

Note: If you want to customize the annotation of classes/clades, then please edit clades_extended.csv in reference folder.