Revise truncated pseudo attributes #333
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bakta and various standard output formats (Genbank, EMBL, GFF3) use slightly different terms and approaches how to declare truncated genes and pseudogenes.
In Bakta, a feature is declared as truncated if there is information from a downstream analysis tool, e.g. Pyrodigal, Infernal, etc.
Besides these, Bakta accepts true pseudogenes from tRNAscan-SE and from its own internal CDS workflow.
To strictly follow INSDC specs, for Genbank, EMBL and GFF3 output files (#330), Bakta now declares all truncated features as
pseudo
reflecting technical issues like sequencing and assembly errors on the one side, and truepseudogenes
on the other side emerging from biological pseudogenization events like InDels and mutations.Internally, Bakta uses
truncated
andpseudogene
attributes to reflect the different states. In the human readableTSV
output file (meant for a quick glimpse), Bakta adds feature product prefixes(pseudo)
,(truncated)
,(5' truncated)
and (3' truncated)`.