forked from fdarthen/taXaminer
-
Notifications
You must be signed in to change notification settings - Fork 1
Input data
Freya Arthen edited this page Feb 20, 2024
·
2 revisions
- genomic FASTA file
- nucleotide sequence of the assembly divided in contigs/scaffolds
- gene annotation as sorted GFF3 file following the format standards
- proteins FASTA file (optional)
- protein sequences for all protein coding genes in the data set
- headers must start with either the ID attribute of gene, mRNA or CDS feature in the GFF file
- any text after first space or pipe will be ignored
- a mapping file of protein to gene can be provided with the parameter 'prot2gene_mapper' (format: 'protein_headergene_id')
- will be extracted within taXaminer pipeline if not provided
- protein sequences for all protein coding genes in the data set
- coverage information as sorted BAM file (optional)
- multiple mapping files can be provided
- config file
- YAML format