Skip to content

applicativesystem/miniprot-protein-annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

miniprot-protein-annotator

  • a protein coding regions annotator that will take the alignment file in the PAF/GFF format and will generate the fasta from the corresponding fasta files for the aligned regions.
  • implemented faster rates so that you can parse as many aligned regions as you want.
  • you can also create the protein tokenzier from the same for machine learning.
 # align your genome with the given protein  using the miniprot such as 
   miniprot --gff genome.fasta protein.fasta > sample.gf
  • and then run the proteinannotator to extract all the complete coding regions
generatingAlignments("/home/gaurav/Desktop/final_code_push/multi.gff", 
                       "/home/gaurav/Desktop/final_code_push/multi.fasta", 
                              "/home/gaurav/Desktop/final_code_push/multiout.fasta")

Gaurav Sablok
University of Potsdam
Potsdam,Germany