Understanding of --proteins flag #169
-
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi and thanks for this question! So, it does not interfere in the gene prediction, regional annotation or filtering, but in the functional annotation of predicted protein-coding genes (CDS). Via the following two simple Fasta header schema, user can provide a lot of information. Via the short schema, users can provide a gene symbol, a product description and database cross-references. Here, standard alignment thresholds are used, i.e. 90% seq identity and 80% mutual query/subject coverage:
If required/desired, these alignment thresholds can be adopted via a longer and more expressive schema:
These are then used as a user provided proteins expert annotation system. |
Beta Was this translation helpful? Give feedback.
Hi and thanks for this question!
The
--proteins
option is not used to filterCDS
but to improve their annotation. If users have a trusted set of proteins along with decent annotations that are not included in the standard database or for which a user has better (more descriptive, more specific) annotations, then the--proteins
option offers a simple mechanism to feed these into the normal functional annotation workflow of Bakta.So, it does not interfere in the gene prediction, regional annotation or filtering, but in the functional annotation of predicted protein-coding genes (CDS). Via the following two simple Fasta header schema, user can provide a lot of information.
Via the short sch…