Skip to content

Latest commit

 

History

History
131 lines (99 loc) · 5.28 KB

Install.md

File metadata and controls

131 lines (99 loc) · 5.28 KB

Class 1 Identification

How to install PhyLTR: Three steps

  1. Clone repository
  2. Install dependencies
  3. Download databases

*Optionally add pHMMs for domain annotation

Clone Repository

git clone https://github.com/mcsimenc/PhyLTR.git

Install dependencies

1. Install Python 3 and Biopython

2. Install the following however you can, then add the program paths to the CONFIG file in the PhyLTR root directory.

Parts of PhyLTR require only certain dependencies. See README.md for an explanation of dependency requirements for each process.

3. Edit CONFIG file and add dependency paths

The CONFIG file has format: key=path where key needs to be exactly as shown below and path is expected to point to either the file of the program itself or the directory containing the program, depending on the dependency.

bedtools=file # bedtools executable
mafft=file # mafft executable
fasttree=file # fasttree executable
trimal=file # trimal executable
jmodeltest2=file # jModelTest.jar
genometools=file # gt executable
geneconv=file # geneconv executable
paup=file # paup executable
rscript=file # Rscript executable
perl=file # perl executable
circos=file # circos executable
pathd8=file #PATHd8 executable
getorf=file # EMBOSS getorf executable
phylip=directory # the bin/ directory in the PHYLIP installation
hmmer=directory # the binaries/ directory in the HMMER3 installation
blast=directory # the bin/ directory in the BLAST+ installation
mcl=directory # the bin/ directory in the MCL installation

4. Download databases: Dfam and Repbase

Dfam

B. Run: PhyLTR/scripts/DfamExtractLTRelements.py < Dfam.hmm > Dfam_ERV_LTR.hmm
C. Run: PhyLTR/scripts/Dfam3.xHMM2SuperFamTable.py < Dfam_ERV_LTR.hmm > Dfam_ERV_LTR.SF
D. Run: cut -f1 < Dfam_ERV_LTR.SF > Dfam_ERV_LTR.list
E. Move the files from B,C,D to the following locations:
PhyLTR/RepeatDatabases/Dfam/Dfam_ERV_LTR.hmm
PhyLTR/RepeatDatabases/Dfam/Dfam_ERV_LTR.SF
PhyLTR/RepeatDatabases/Dfam/Dfam_ERV_LTR.list

Repbase

A. Get an account with GIRI No longer free.
  1. Go to http://www.girinst.org/repbase/update/browse.php
  2. Select LTR Retrotransposon from the Repeat class dropdown list.
  3. Select FASTA from the Output format drop down list.
  4. Click the Download button, sign in, and download the text page that opens.
  5. Repeat steps 2-4 but select Endogenous Retrovirus from the Repeat class dropdown list.
  6. Run: cat <LTR.fa> <ERV.fa> >> Repbase_ERV_LTR.fasta
  7. Move the new file from 6 to: PhyLTR/RepeatDatabases/Repbase/Repbase_ERV_LTR.fasta
B. Select IG from the Output format drop down list and download the ERV and LTR Retrotransposon entries in IG format, then concatenate to join both files as in A.6. as Repbase.LTR-ERV-concatenated.IG
C. Run: PhyLTR/scripts/RepbaseIG2superfamilies.py < Repbase.LTR-ERV-concatenated.IG > Repbase_ERV_LTR.SF
D. Run: cut -f1 < Repbase_ERV_LTR.SF > Repbase_ERV_LTR.list
E. Move the files from A,B,C to the following locations:
PhyLTR/RepeatDatabases/Repbase/Repbase_ERV_LTR.fasta
PhyLTR/RepeatDatabases/Repbase/Repbase_ERV_LTR.SF
PhyLTR/RepeatDatabases/Repbase/Repbase_ERV_LTR.list

5. Add pHMMs for domain annotation (optional)

Append any HMMs you want to include to PhyLTR/LTRdigest_HMMs/hmm

The version included in repository contains pHMMs for TE-related domains from Pfam and from gydb.org, downloaded Summer 2018.

NonLTR

Class 2 Identification

Repbase

A. Get an account with GIRI No longer free.
  1. Go to http://www.girinst.org/repbase/update/browse.php
  2. Select DNA Transposons from the Repeat class dropdown list.
  3. Select FASTA from the Output format drop down list.
  4. Click the Download button, sign in, and download the text page that opens.
  5. Repeat steps 2-4 but select Helitrons from the Repeat class dropdown list.
  6. Run: tBlastn against each corresponding class separedley
  7. Extract best hits using best_blast_hit.py

De novo Identification

Annotation