Skip to content

Commit

Permalink
update doc. Fix #458
Browse files Browse the repository at this point in the history
  • Loading branch information
Juke34 committed Jun 3, 2024
1 parent 5f590b6 commit 43a0919
Showing 1 changed file with 42 additions and 4 deletions.
46 changes: 42 additions & 4 deletions bin/agat_sp_fix_cds_phases.pl
Original file line number Diff line number Diff line change
Expand Up @@ -81,16 +81,54 @@
=head1 NAME
agat_sp_fix_cds_frame.pl
agat_sp_fix_cds_phases.pl
=head1 DESCRIPTION
This script aims to fix the cds phases.
This script aims to fix the CDS phases.
The script is compatible with incomplete gene models (Missing start, CDS
multiple of 3 or not, i.e. with offset of 1 or 2) and + and - strand.
How it works?
AGAT uses the fasta sequence to verify the CDS frame.
In case the CDS start by a start codon the phase of the first CDS piece is set to 0.
In the case there is no start codon and:
- If there is only one stop codon in the sequence and it is located at the last position, the phase of the first CDS piece is set to 0.
- If there is no stop codon, the phase of the first CDS piece is set to 0 (because sequence can be translated without premature stop codon).
- If there is/are stop codon(s) in the middle of the sequence we re-execute the check with an offset of +2 nucleotides:
- If there is only one stop codon in the sequence and it is located at the last position, the phase of the first CDS piece is set to 0.
- If there is no stop codon, the phase of the first CDS piece is set to 0 (because sequence can be translated without premature stop codon).
- If there is/are stop codon(s) in the middle of the sequence we re-execute the check with an offset of +1 nucleotide:
- If there is only one stop codon in the sequence and it is located at the last position, the phase of the first CDS piece is set to 0.
- If there is no stop codon, the phase of the first CDS piece is set to 0 (because sequence can be translated without premature stop codon).
- If there is/are still stop codon(s) we keep original phase and throw a warning. In this last case it means we never succeded to make a translation without premature stop codon in all the 3 possible phases.
Then in case of CDS made of multiple CDS pieces (i.e. discontinuous feature), the rest of the CDS pieces will be checked accordingly to the first CDS piece.
What is a phase?
For features of type "CDS", the phase indicates where the next codon begins
relative to the 5' end (where the 5' end of the CDS is relative to the strand
of the CDS feature) of the current CDS feature. For clarification the 5' end
for CDS features on the plus strand is the feature's start and and the 5' end
for CDS features on the minus strand is the feature's end. The phase is one of
the integers 0, 1, or 2, indicating the number of bases forward from the start
of the current CDS feature the next codon begins. A phase of "0" indicates that
a codon begins on the first nucleotide of the CDS feature (i.e. 0 bases forward),
a phase of "1" indicates that the codon begins at the second nucleotide of this
CDS feature and a phase of "2" indicates that the codon begins at the third
nucleotide of this region. Note that "Phase" in the context of a GFF3 CDS
feature should not be confused with the similar concept of frame that is also a
common concept in bioinformatics. Frame is generally calculated as a value for
a given base relative to the start of the complete open reading frame (ORF) or
the codon (e.g. modulo 3) while CDS phase describes the start of the next codon
relative to a given CDS feature.
The phase is REQUIRED for all CDS features.
=head1 SYNOPSIS
agat_sp_fix_cds_frame.pl --gff infile.gff -f fasta [ -o outfile ]
agat_sp_fix_cds_frame.pl --help
agat_sp_fix_cds_phases.pl --gff infile.gff -f fasta [ -o outfile ]
agat_sp_fix_cds_phases.pl --help
=head1 OPTIONS
Expand Down

0 comments on commit 43a0919

Please sign in to comment.