Skip to content

Input data

Freya Arthen edited this page Feb 20, 2024 · 2 revisions
  • genomic FASTA file
    • nucleotide sequence of the assembly divided in contigs/scaffolds
  • gene annotation as sorted GFF3 file following the format standards
  • proteins FASTA file (optional)
    • protein sequences for all protein coding genes in the data set
      • headers must start with either the ID attribute of gene, mRNA or CDS feature in the GFF file
      • any text after first space or pipe will be ignored
      • a mapping file of protein to gene can be provided with the parameter 'prot2gene_mapper' (format: 'protein_headergene_id')
    • will be extracted within taXaminer pipeline if not provided
  • coverage information as sorted BAM file (optional)
    • multiple mapping files can be provided
  • config file
    • YAML format
Clone this wiki locally